Press Release

EMC and Kaggle Partner to Enable On-Demand Data Scientist Workforce

EMC Greenplum Chorus Data Science Platform Unites Worlds of Social and Big Data Analytics, Opens Access to 55,000 Kaggle Data Scientists, Released As Open Source License

Story Highlights

  • EMC Corporation, through its Greenplum division, and Kaggle today joined forces to tackle the short supply and heavy demand for data scientists with an integration between the Kaggle data science community and EMC Greenplum Chorus, the platform for data science.
  • Chorus users wishing to engage the Kaggle community now can search, browse, and drill into profiles of the Kaggle community members who are interested in collaborating. Conversely, Kaggle’s community of over 55,000 data scientists now can choose to opt into consulting opportunities that will be sourced through the Greenplum Chorus platform. This integration transforms the way Greenplum Chorus users to get assistance with their Big Data problems. It also allows Kaggle’s elite data scientists to expand the market for their highly-sought-after skills.
  • Today EMC also released the Greenplum Chorus source code under an Apache open source license through the OpenChorus Project. The OpenChorus Project will speed innovation and adoption of collaborative data science practices, helping organizations to drive greater business insight and economic value from Big Data.
  • Chorus and the integration with Kaggle will be demonstrated at the O’Reilly Strata Conference + Hadoop World being held this week in New York.
New York, NY, October 23, 2012 - 

Today at the O’Reilly Strata Conference + Hadoop World in New York, EMC Corporation (NYSE: EMC)  announced the availability of the EMC® Greenplum Chorus open source code and continued its goal of enabling organizations to derive greater insight and economic value from Big Data with an announcement with Kaggle, a platform for data science competitions.

According to a May 2011 report, ‘Big Data: The next frontier for innovation, competition, and productivity’ from the McKinsey Global Institute, there is a shortage of talent necessary for organizations to take advantage of Big Data. This shortage of talent is widely believed to be the biggest block to the wholesale adoption of Big Data by industry. EMC and Kaggle today announced that they have joined forces to tackle the short supply for data scientists by integrating Greenplum Chorus, the social platform for collaborative data science, with Kaggle’s community of over 55,000 data scientists. This announcement is expected to transform the way organizations with data problems can find and connect with the data scientists who can help solve them.

In the legacy analytics process, data scientists face challenges in accessing and sharing the right data. Greenplum Chorus helps foster a complete data science ecosystem with best-of-breed analytics applications. As a social platform for collaborative data science, Greenplum Chorus users can increase productivity, decrease administrative burdens on IT infrastructures, and get better visibility and faster access to data through a single tool.

Those who are part of Kaggle’s community can choose to opt-in to doing contract work through Chorus. From within the Chorus interface, Chorus users wishing to engage the Kaggle community will search, browse, and drill into profiles of Kaggle community members who are interested in collaborating together. Through secure integration of Chorus and Kaggle APIs, users can expose relevant information from Chorus Workspaces and send secure messages.  Kaggle certifies Chorus as the source of these messages and forwards messages to the appropriate recipients.  Once Kaggle community members review the material, they can respond directly to the Chorus user in order discuss details and initiate the project together.

The Kaggle and Chorus integration brings a new dimension by expanding the opportunity for the industry to realize the benefits of collaborating around Big Data and for elite data scientists to expand the market for their skills.

To be truly impactful, companies’ data strategies must be agile. As such, EMC Greenplum’s OpenChorus Project has a mission to foster widespread development of Big Data applications and solutions by making Chorus’s code open and accessible.

In addition to Kaggle, a number of EMC Greenplum partners have voiced support of the OpenChorus Project and to integrate their tools and solutions with Chorus.  Those partners include Actuate, ADVIZOR Solutions, Alpine Data Labs, Gnip, Informatica, Pentaho, Pervasive, SAS, Syncsort, and Tableau Software.  Partners commenting on the OpenChorus project can be found at here.

Executive Quotes

Scott Yara, Senior Vice President of Products, Greenplum, a division of EMC

“Collaboration by individuals, organizations and communities is essential in achieving success with Big Data analytics. The OpenChorus Project is part of a wave of Big Data technologies, strategies, and tools announced by EMC Greenplum all with one unified mission—to expand Big Data opportunities that help customers drive greater business insight and economic value from their data than ever before. Success depends on having a collaboration platform and solving the number one problem of the big data era: the supply and demand for data scientists.  And today with Kaggle and their community of over 55,000 data scientists we’ve believe we are forever changing the way data science will be done.”

Anthony Goldbloom, CEO, Kaggle

“Teaming with EMC Greenplum opens up new and exciting opportunities to existing and future Kaggle community members.  The partnership also helps to solve the acute shortage of elite data scientists, which prevents companies from taking full advantage of their data. ”


Greenplum Chorus source code is now available through the OpenChorus Project. Chorus and Kaggle integration is expected to be available in November 2012. Take action today – download Chorus, find your data, visualize your data, resource your project, analyze and model, share insights and collaborate, and contribute back to the community. Data scientists interested in being part of the Kaggle community should visit

Additional Resources

About EMC

EMC Corporation is a global leader in enabling businesses and service providers to transform their operations and deliver IT as a service. Fundamental to this transformation is cloud computing. Through innovative products and services, EMC accelerates the journey to cloud computing, helping IT departments to store, manage, protect and analyze their most valuable asset — information — in a more agile, trusted and cost-efficient way. Additional information about EMC can be found at

About the OpenChorus Project

Through the OpenChorus Project, Greenplum provides a framework for fostering the collaborative data science community, including individual developers, application partners, data source providers, data scientists, and the Chorus user community. Greenplum Chorus is an integrated development environment that expands insights with simple access to third-party data and data science tools to promote Big Data agility and collaboration for data science teams, decreases vendor dependency with added flexibility, and fosters the data science ecosystem and community. Developers and partners can get involved by visiting

About Kaggle

Kaggle is the global leader in running predictive modeling competitions. The company has run over 100 competitions with major enterprise, government, and academic customers, including Allstate Insurance, Dunnhumby, Facebook, Ford, Heritage Health, Merck, Microsoft, NASA, Stanford, and Wikipedia. Over 55,000 data scientists worldwide have contributed to competitions that tackled the toughest predictive problems in the marketing, life sciences, insurance, financial services, travel, and science industries. Kaggle’s investors include Index Ventures and Khosla Ventures. It was founded in 2010 and is based in San Francisco, Calif.

About Greenplum, a division of EMC

Greenplum, a division of EMC, is driving the future of Big Data analytics with breakthrough products that harness the skills of data science teams to help global organizations realize the full promise of business agility and become data-driven, predictive enterprises. The division's products include Greenplum® Unified Analytics Platform, Greenplum® Data Computing Appliance, Greenplum® Database, Greenplum® Analytics Lab, Greenplum® HD and Greenplum® Chorus™. They embody the power of open systems, cloud computing, virtualization and social collaboration, enabling global organizations to gain greater insight and value from their data than ever before possible. Learn more at

Press Contacts

David Oro
(415) 885-9898

EMC and Greenplum are trademarks or registered trademarks of EMC Corporation in the U.S. and other countries. All other trademarks are the property of their respective owners.