Much of this data explosion represents unstructured data that can be difficult to format and evaluate via data analysis.
This includes unstructured data such as social media posts, recorded call center interactions between customers and agents, health records, and the bodies of email messages.
However, there are steps that businesses can take to improve how they go about gathering data, integrating data from multiple sources, and using data analysis techniques to manage the data explosion sensibly, as Glenda Nevill notes in a recent blog post.
These tips will also help companies more effectively use customer data and other data streams to improve operations, optimize marketing efforts, and drive better business performance.
Here’s a step-by-step approach to help data scientists tame the data beast:
- Classify unstructured data. Most corporate data environments are pretty chaotic. Word documents, email, PDFs, spreadsheets, and other data files are scattered across the enterprise. The good news is that most unstructured data is also clear text. As such, this data can be read, indexed, compressed, and stored fairly easily. Classifying unstructured data is the first step to being able to identify unstructured data sources before eventually parsing and using data visualization tools.
- Set enforceable storage policies. Most data has a shelf life. New data is frequently accessed during its first 90 days of life and usage tends to taper off after that. Because of these usage trends, data should be regularly examined for dates, the most recent usage, and then discarded or archived based on data retirement policies enforced by the IT organization.
- Evaluate your BI infrastructure and adjust as needed. Before organizations begin analyzing unstructured data, it’s helpful to evaluate the current business intelligence (BI) infrastructure that’s in place and how it all fits together. It’s not always easy to create structured definitions of data that’s stored within non-traditional data sources. As such, the data management team should identify the steps that are needed to integrate unstructured data into a structured BI environment.
- Don’t overlook metadata. Making effective use of unstructured data requires an approach to organizing and cataloging content. In order to use the content, it’s helpful to know what that content is. Some systems automatically capture process-related metadata, or attributes such as creation date, author, title, etc. However, applying metadata to actual content such as content summaries, companies or people mentioned, or topic keywords can be considerably more useful.
- Apply unstructured data analysis. BI tools can’t analyze unstructured data directly. However, specialized data analysis technology can be used to analyze unstructured data as well as to produce a data model that BI tools can work with. Unstructured data analysis can start by using a natural language engine to measure keyword density. This approach, along with the use of metadata, can help data scientists and decision makers get at the heart of what key stakeholders are looking for using data discovery tools and techniques (e.g. positive or negative comments about a company in social media comments).
Source : Tibco