Pat Gelsinger: How Big Data Will Change Business

Pat Gelsinger is President and Chief Operating Officer, EMC Information Infrastructure Products. The remarks below were taken from his Oracle OpenWorld keynote address on October 3, 2011.

Increasingly, we're seeing massive unstructured data environments with a broad set of data sets emerging. The large datasets of the future will increasingly be unstructured data sources as well. The question is, what's the infrastructure that's uniquely designed and available for that?

Data is growing faster than Moore's Law, and we need a different architectural approach since a scale-up approach will be fundamentally inadequate to keep up with exploding datasets. The answer is a scale-out, automated environment, and this is what EMC is building across our storage assets, our compute and virtualization assets, as well as across our analytics framework. Scale-out architecture is essential. We need one that scales to mammoth size, like 10-plus petabytes, and in the future to 100-plus petabytes, with ease of capacity expansion without administrative requirements and linear performance and scale requirements as you go forward.

This is specifically what we are able to achieve with Isilon—capacity, scalability and simplicity in a storage infrastructure environment. Between Isilon and Atmos, we're creating the fundamental infrastructure that will allow us to scale for those big data storage environments of the future.

As datasets get to be really, really big, and compute resources requirements grow, you need to do a better job at migrating the data to the compute. Specifically in that area, we're extending our FAST technologies in what we call Project Lightning—to move storage closer to the compute farm.

In the other direction, there are workloads that are extraordinary in size but have modest compute requirements. Wouldn't it be really cool if we created a flexible architectural environment across this large pool of resources that allowed us to dynamically move across both of those worlds? We could effectively tier right across storage, all the way through the server, essentially extending the storage environment onto the server. This is what we're working to build for the future.

Cloud is changing the way we run IT but big data is changing the way we run our business. We're engaged with many customers in what we call big data transformative opportunities. One of these customers is a big insurance company. The way they currently price their insurance is to take a very simplistic risk model and essentially have all the good drivers subsidize a few bad drivers. If we could do a better job at picking who those bad drivers are, the insurance companies can either raise their rates or nudge them out of their customer pools.

Another example is a financial institution we're working with. Today, the only information they bring into their environment is their online customers—people logging into their websites, looking at their service and offerings. The only thing that they capture is where the online customers come from, essentially, where they clicked from. Once they added our first big data enhanced applications tracking broader click stream data and real-time segmentation analysis, they've been able to double the effectiveness of their online offers. It's like the Amazon model where the company can now offer products and services that the customer might be interested in and bring those big data environments into those customer experiences.

Another set of customers, New York Stock Exchange (NYSE) and National Association of Securities Dealers Automated Quotation (NASDAQ), are taking these big data environments to look through massive amounts of data, specifically looking for securities fraud, risk analysis, etc... But again, big data requires a very different way to work and operate on data. It's big. It needs to be expansive and iterative. It needs scale-out architecture. It needs to address structured and unstructured data sources. We don't want to operate on summaries, tables and cubes. Full analysis of data has to be agile, self-service, increasingly real-time, and ultimately collaborative and integrated in those environments.

This is exactly what EMC is out to build with our unified analytics platform. With Greenplum and Hadoop as the core technologies on a scalable scale-out infrastructure, they provide a rich set of capabilities to integrate and bring data to and from a broad set of data sources, and build an environment that allows what we call the data science team to work together. This would be the traditional data scientists with the data platform administrator working with people who have unique business intelligence for unique insights into what those big data opportunities might be, and ultimately expanding what you do and the value that you bring to the companies you serve.

We see this as a great opportunity—the traditional database analysts working with a narrow set of business analysts for a very specific set of datasets and services. But, in the future, these datasets will explode, including massive structured as well as unstructured data sources, making what you do for the enterprise grow substantially. It also then enables an environment that transforms your role to work across this big data platform and broad set of data resources in addition to the traditional structured transactional database. You'll work across what we call the data science team, the data operations person, the business analysts, the data scientists, the people who know how to do these analytic frameworks, and out of that, bring new insights for line of business decisions, changing the way that businesses work and operate as a result.

Cloud computing has made your pursuit of big data possible by delivering massive infrastructure scalability and dramatic productivity improvements. We see this explosion of enterprise data, and with that we need a new architecture and a new model, fundamentally a scale-out architecture that is enabled to comprehend that data and bring analytics to that environment. We see that big data presents big opportunities, transforming what you do and delivering far more value, results and business impact to the companies that you serve.

No resources matching your request were found.


CIO Connect LogoEMC logo