Press Release

EMC Announces 1000 Node Analytic Platform to Accelerate Industr' Hadoop Testing and Development

EMC and industry leading companies including Intel, VMware, Micron, Seagate, Supermicro, Switch, and Mellanox Technologies Partner To Deliver

Strata Summit, New York, NY, September 21, 2011 - 
News Summary:
  • EMC today announced a strategy for facilitating innovation of the Apache Hadoop open-source software as an enterprise-ready tool for Big Data.
  • EMC introduces the Greenplum® Analytics Workbench, the industry's largest test bed cluster for regular integration testing on the Apache Hadoop trunk and its continuing releases.
  • The Greenplum Analytics Workbench will enable the Apache Hadoop open source community to validate code to scale on a regular, ongoing basis. With contributions certified at scale, enterprises can run them with confidence.
Full Story:

EMC Corporation (NYSE: EMC) today announced the creation of the Greenplum® Analytics Workbench, which will be used for regular integration tests on Apache Hadoop. The 1,000-plus node test bed cluster incorporates technology from the world's leading software and hardware manufacturers with the intention of providing the infrastructure needed to facilitate Apache Hadoop innovation. With the availability of a large-scale test bed, developers can have their contributions validated at scale, and enterprises can confidently deploy new releases in a production environment.

Apache Hadoop has rapidly emerged as the preferred solution for Big Data analytics across unstructured data. Organizations looking for opportunity in an ever-changing business environment are finding that Big Data analysis is the competitive advantage. In fact, according to a 2011 TDWI survey, 34% of companies do big data analytics today, and that number is growing. Hadoop-based batch processing of unstructured and structured data at massive scale using commodity hardware has led to a profound change in analytics. By extracting the knowledge wrapped within unstructured and machine-generated data, organizations can make better decisions that drive revenue, improve service and reduce costs.

Hadoop innovation and development is reliant upon contributions made by open source developers. However, the Apache Hadoop community has consistently faced the challenge of provisioning the required resources to validate new releases of the open source software. Without access to a large cluster for scale validation, the Apache community – and enterprise users – must wait for Hadoop user communities to sponsor an effort to run scale validations. This is done very infrequently and a lot of time is spent stabilizing releases for enterprise adoption.

With an aggressive plan for testing on the Apache Hadoop trunk and its continuing releases, EMC is excited to contribute to the Hadoop open source community by providing testing resources it lacks to quickly identify bugs, stabilize new releases and optimize hardware configurations in an effort to speed up the innovation of Hadoop. EMC plans to provide test results to the Apache Software Foundation and open source community, and EMC's testing will be planned in coordination with the Apache Hadoop project.

The Greenplum Analytics Workbench is the result of a collaboration of several hardware and software vendors including:

  • EMC
  • Intel
  • Mellanox Technologies
  • Micron
  • Seagate
  • SuperMicro
  • Switch
  • VMware

The test bed cluster, which consists of 1,000-plus hardware nodes or 10,000 nodes with the addition of virtual machines, features 24 petabytes of physical storage. This is the equivalent of nearly half of the entire written works of mankind, from the beginning of recorded history.

Supporting Quotes:

"EMC and its partners have made a significant contribution to the Apache Hadoop community by promising to validate Apache Hadoop releases on clusters at petabyte scale. With access to continuous integration testing, the world's best unstructured data analytics software will get better and faster, allowing companies and organizations to gain better insights from their data."
- Dhruba Borthakur, Member of Hadoop Project Management Committee

"The EMC 1k node cluster fills a vital resource gap, one that has been missing up to this validating Apache Hadoop builds and releases at scale. I can't wait to take it out for a burn."
- Michael Stack, Engineer at StumbleUpon and Member of Hadoop Project Management Committee

"Apache Hadoop at this stage needs a standardized tool for testing and validating Hadoop releases at scale. EMC's 1,000 node test bed launch will facilitate the development of Apache Hadoop as a vital tool for Big Data analytics, advance its internal innovation, and lead to greater adoption of Hadoop. I am especially pleased that EMC is contributing its findings back to the open source community."
- Konstantin Shvachko of eBay, Member of Apache Project Management Committee

"Intel is excited to be a part of the largest Hadoop test bed cluster ever built. Being able to analyze Big Data sets and make use of the tremendous volume of unstructured data being created is an opportunity that could transform entire industries. The latest Intel® Xeon® 5600 series processors will provide the processing power required to scale Big Data analytics and realize the full potential of Apache Hadoop. The entire open source community, including Intel, will benefit from the key learnings from both development and testing on the cluster."
- David Tuhy, General Manager of the Storage Group, Intel Corporation

"Greenplum is excited to be part of the elite group of hardware and software manufacturers that made possible the Greenplum Analytics Workbench. The test bed cluster, at 1,000-plus hardware nodes, is itself an accomplishment. But more importantly, we are excited to make this test bed available to the open source community so that enterprises can feel comfortable deploying Apache Hadoop in a production environment and can reap the benefits of Big Data analytics."
- Luke Lonergan, Chief Technology Officer, Greenplum, a division of EMC

Industry Buzz Around Greenplum Analytics Workbench:

About EMC

Regular testing cycles on the Greenplum Analytics Workbench will begin early next year.

EMC Corporation is a global leader in enabling businesses and service providers to transform their operations and deliver IT as a service. Fundamental to this transformation is cloud computing.  Through innovative products and services, EMC accelerates the journey to cloud computing, helping IT departments to store, manage, protect and analyze their most valuable asset — information — in a more agile, trusted and cost-efficient way. Additional information about EMC can be found at

Press Contacts

David Oro

EMC, Greenplum, and Greenplum Chorus are trademarks or registered trademarks of EMC Corporation in the U.S. and other countries. All other trademarks are the property of their respective owners.

Forward-Looking Statements
This release contains "forward-looking statements" as defined under the Federal Securities Laws. Actual results could differ materially from those projected in the forward-looking statements as a result of certain risk factors, including but not limited to: (i) adverse changes in general economic or market conditions; (ii) delays or reductions in information technology spending; (iii) the relative and varying rates of product price and component cost declines and the volume and mixture of product and services revenues; (iv) competitive factors, including but not limited to pricing pressures and new product introductions; (v) component and product quality and availability; (vi) fluctuations in VMware, Inc.'s operating results and risks associated with trading of VMware stock; (vii) the transition to new products, the uncertainty of customer acceptance of new product offerings and rapid technological and market change; (viii) risks associated with managing the growth of our business, including risks associated with acquisitions and investments and the challenges and costs of integration, restructuring and achieving anticipated synergies; (ix) the ability to attract and retain highly qualified employees; (x) insufficient, excess or obsolete inventory; (xi) fluctuating currency exchange rates; (xii) threats and other disruptions to our secure data centers or networks; (xiii) our ability to protect our proprietary technology; (xiv) war or acts of terrorism; and (xv) other one-time events and other important factors disclosed previously and from time to time in EMC's filings with the U.S. Securities and Exchange Commission. EMC disclaims any obligation to update any such forward-looking statements after the date of this release.