EMC and Kaggle Partner to Enable On-Demand Data Scientist Workforce

EMC Greenplum Chorus Data Science Platform Unites Worlds of Social and Big Data Analytics, Opens Access to 55,000 Kaggle Data Scientists, Released As Open Source License

  • EMC Corporation, through its Greenplum division, and Kaggle today joined forces to tackle the short supply and heavy demand for data scientists with an integration between the Kaggle data science community and EMC Greenplum Chorus, the platform for data science.
  • Chorus users wishing to engage the Kaggle community now can search, browse, and drill into profiles of the Kaggle community members who are interested in collaborating. Conversely, Kaggle’s community of over 55,000 data scientists now can choose to opt into consulting opportunities that will be sourced through the Greenplum Chorus platform. This integration transforms the way Greenplum Chorus users to get assistance with their Big Data problems. It also allows Kaggle’s elite data scientists to expand the market for their highly-sought-after skills.
  • Today EMC also released the Greenplum Chorus source code under an Apache open source license through the OpenChorus Project. The OpenChorus Project will speed innovation and adoption of collaborative data science practices, helping organizations to drive greater business insight and economic value from Big Data.
  • Chorus and the integration with Kaggle will be demonstrated at the O’Reilly Strata Conference + Hadoop World being held this week in New York.
New York, NY, October 23, 2012 - 

Today at the O’Reilly Strata Conference + Hadoop World in New York, EMC Corporation (NYSE: EMC)  announced the availability of the EMC® Greenplum Chorus open source code and continued its goal of enabling organizations to derive greater insight and economic value from Big Data with an announcement with Kaggle, a platform for data science competitions.

According to a May 2011 report, ‘Big Data: The next frontier for innovation, competition, and productivity’ from the McKinsey Global Institute, there is a shortage of talent necessary for organizations to take advantage of Big Data. This shortage of talent is widely believed to be the biggest block to the wholesale adoption of Big Data by industry. EMC and Kaggle today announced that they have joined forces to tackle the short supply for data scientists by integrating Greenplum Chorus, the social platform for collaborative data science, with Kaggle’s community of over 55,000 data scientists. This announcement is expected to transform the way organizations with data problems can find and connect with the data scientists who can help solve them.

In the legacy analytics process, data scientists face challenges in accessing and sharing the right data. Greenplum Chorus helps foster a complete data science ecosystem with best-of-breed analytics applications. As a social platform for collaborative data science, Greenplum Chorus users can increase productivity, decrease administrative burdens on IT infrastructures, and get better visibility and faster access to data through a single tool.

Those who are part of Kaggle’s community can choose to opt-in to doing contract work through Chorus. From within the Chorus interface, Chorus users wishing to engage the Kaggle community will search, browse, and drill into profiles of Kaggle community members who are interested in collaborating together. Through secure integration of Chorus and Kaggle APIs, users can expose relevant information from Chorus Workspaces and send secure messages.  Kaggle certifies Chorus as the source of these messages and forwards messages to the appropriate recipients.  Once Kaggle community members review the material, they can respond directly to the Chorus user in order discuss details and initiate the project together.

The Kaggle and Chorus integration brings a new dimension by expanding the opportunity for the industry to realize the benefits of collaborating around Big Data and for elite data scientists to expand the market for their skills.

To be truly impactful, companies’ data strategies must be agile. As such, EMC Greenplum’s OpenChorus Project has a mission to foster widespread development of Big Data applications and solutions by making Chorus’s code open and accessible.

In addition to Kaggle, a number of EMC Greenplum partners have voiced support of the OpenChorus Project and to integrate their tools and solutions with Chorus.  Those partners include Actuate, ADVIZOR Solutions, Alpine Data Labs, Gnip, Informatica, Pentaho, Pervasive, SAS, Syncsort, and Tableau Software.  Partners commenting on the OpenChorus project can be found at here.

Scott Yara, Senior Vice President of Products, Greenplum, a division of EMC

“Collaboration by individuals, organizations and communities is essential in achieving success with Big Data analytics. The OpenChorus Project is part of a wave of Big Data technologies, strategies, and tools announced by EMC Greenplum all with one unified mission—to expand Big Data opportunities that help customers drive greater business insight and economic value from their data than ever before. Success depends on having a collaboration platform and solving the number one problem of the big data era: the supply and demand for data scientists.  And today with Kaggle and their community of over 55,000 data scientists we’ve believe we are forever changing the way data science will be done.”

Anthony Goldbloom, CEO, Kaggle

“Teaming with EMC Greenplum opens up new and exciting opportunities to existing and future Kaggle community members.  The partnership also helps to solve the acute shortage of elite data scientists, which prevents companies from taking full advantage of their data. ”


Greenplum Chorus source code is now available through the OpenChorus Project. Chorus and Kaggle integration is expected to be available in November 2012. Take action today – download Chorus, find your data, visualize your data, resource your project, analyze and model, share insights and collaborate, and contribute back to the community. Data scientists interested in being part of the Kaggle community should visit

