High Value Data

Finding the Prize Fish in the Data Lake is Key to Success in the Era of the Third Platform

The phenomenon is this: enterprises are finding new sources of data, new ways to analyze data, new ways to apply the analysis to the business, and new revenues for themselves as a result. They are using new approaches, moving from descriptive to predictive and prescriptive analytics and doing data analysis in real-time. They are also increasingly adopting self-service business intelligence and analytics, giving executives and frontline workers easy-to-use software tools for data discovery and timely decision-making.

tractor in field

Think of Under Armour embedding sensors into the shorts worn by NFL hopefuls at this year’s tryouts. Think of John Deere, which changed its IT organization in 2010 and is now able to deploy new software to tractors in the field nearly every month, software that can help tractors drive straight across plowed fields (fields which may already have sensors in them tracking environmental conditions reporting back to smart phones or PCs). Think of EMI Music, which drives marketing of artists based on data from a database of a million music-listener surveys and information from Spotify music streams.

Call these intrepid enterprises “data-driven companies.” Call them “software-defined companies.” Call them what you will, they are on track to be successful in what IDC calls the era of the Third Platform.

It’s early yet for definitive studies on the opportunities for this new breed of organization, but as reported in the Harvard Business Review, companies in the top third of their industry in the use of data-driven decision making are, on average, 5% more productive and 6% more profitable than their competitors. A study by IDC found that users of Big Data and analytics that use diverse data sources, diverse analytical tools (such as predictive analytics), and diverse metrics were five times more likely to exceed expectations for their projects than those who don’t. The more data and analytics diversity, the better.

Read Big Data: The Management Revolution

However, the digital universe may actually be an obstacle for companies trying to become data-driven. There is too much information, it is too diverse, and it is too effervescent.

The key is to find the part of the digital universe that is richer than others. To size the portion of the digital universe that one could call “target rich” data, IDC analyzed the 60 or so data streams rolling up into the digital universe and gave them subjective measures on 5 criteria:

Easy to access icon

Easy to access.

Can you obtain the data, or is it hopelessly locked away on end-user PCs, shuttling about on closed-end data processing systems, or trapped in proprietary embedded systems?

Transformativee icon


Could this kind of data, properly analyzed and acted upon, actually change a company or society in a meaningful way?

Real-time icon


Is the data available in real-time, or does much of it come too late to drive real-time decisions and actions?

Intersection synergy icon

Intersection synergy.

Could this kind of data have more than one of the above attributes?

Footprint icon


Could top-notch analysis of this data affect a lot of people, major parts of the organization, or lots of customers?

2014 - at 1.5% of the total, target-rich data is a much more manageable area of discovery

These are subjective measures, sure, but by assigning them value we can come up with a first-order approximation of how big this high-value data lake is, and how fast it is growing. For 2014, we peg this type of data at a little more than 6% of data that’s useful if tagged, rising to 11% in 2020. That means we are talking about little more than 1% of the digital universe as a whole, an entirely more manageable area of discovery.

In 2014, the majority of this target rich data is general IT data, which includes all metadata. And that will continue to grow as Big Data projects expand and as metadata builds up, including metadata on metadata. But the embedded portion of the IoT will grow from less than 10% of this target rich data to more than 20% in 2020. The biggest decline in target rich data will come from surveillance data, as the analog-to-digital transition winds down, as compression algorithms improve, and as the installed base growth slows.

target-rich data by type for 2014 and 2020

For enterprises, the news is this: While the size, diversity, and rapid expansion of the digital universe are never-ending challenges to deal with, they are also a source of opportunity for those with the tools and corporate will to take advantage of them.