Attention

This portion of the site uses web technologies and standards which are not compatible with your current browser. Please consider using another browser or upgrading to fully experience the site.

Think Like a Data Scientist

How do data scientists utilize predictive and prescriptive analytics to create business value?

Step 1
Data Scientist

These should be:

Critical to immediate-term performance

Documented (communicated internally/publicly)

Cross-Functional (involving multiple business functions)

Championed by a senior business executive

Measurable against clear financial goals

Time Bound that is well-defined

Advantageous (deliver financial or competitive advantage)

Make It Happen

+

Key Technologies

+
Step 2
Data Scientist
Stakeholder Personas Stakeholder Personas Stakeholder Personas

Develop Stakeholder Personas

Identify the key business stakeholders who either impact or are impacted by the targeted business initiative.

Learn more about building stakeholder personas
Thinking

Make It Happen

+

Key Technologies

+
Step 3
Data Scientist

Identify Strategic Nouns

What are the key business entities that either impact or are impacted by the organization's key business initiative?

Learn more about identifying strategic nouns
Thinking

Customers

Patients

Stores

Wind Turbines

Trucks

Products

Students

Medication

Employees

Make It Happen

+

Key Technologies

+
Step 4
Data Scientist

Determine Key Business Decisions

This is perhaps the hardest part of the "thinking like a data scientist" exercise, which involves examining your strategic nouns from 3 perspectives...

Learn more about how to determine stakeholder key business decisions
Thinking

Descriptive Analytics:

Understanding what happened

How many widgets did I sell last month?

Predictive Analytics:

Predicting what will happen

How many widgets will I sell next month?

Prescriptive Analytics:

Recommending what to do next

How much of component Z should I order?

Make It Happen

+

Key Technologies

+
Step 5
Data Scientist

Leverage "By" Analysis.

This is an exploratory technique of examining a strategic entity by its data attributes. This can uncover:

  • Additional data sources
  • Additional dimensional entity characteristics
  • Additional areas for analytics exploration
Learn more about "By" analysis
Thinking

"Show me Customer
habits by..."

  • Category
  • Remodel Date
  • Store
  • Day of Week
  • Customer demo

Make It Happen

+

Key Technologies

+
Step 5
Data Scientist

Create Actionable Scores

Look for groupings of strategic noun dimensions and attributes that can be combined to create a more predictive and actionable score.

Learn more about
scoring techniques
Thinking

FICO Score

Examples of Score Techniques Examples of Score Techniques - Gauge needle

Make It Happen

+

Key Technologies

+
Step 5
Data Scientist
Data Scientist Data Scientist

Put Analytics Into Action

Deliver analytics-driven scores and recommendations to the key business stakeholders.

Learn more about putting analytics into action
Thinking
Persona Persona Persona
Analytics Analytics Analytics

Make It Happen

+

Key Technologies

+

Learn what data science can do for you at:
www.EMC.com/BigData

Data Lakes for Big Data and Analytics

The Data Lake was born out of the "economics of big data" that allow organizations to store massive amounts of data at a cost that can be 20x to 50x cheaper than traditional data warehouse technologies. Due to the agile underlying Hadoop/HDFS architecture that typically supports the Data Lake, organizations can store structured data (relational tables, csv files), semi-structured data (web logs, sensor logs, beacon feeds) and unstructured data (text files, social media posts, photos, images, video) as-is without the time-consuming and agility-limiting need to pre-define a data schema on load.

However, the real power of the Data Lake is to enable the data science team to utilize advanced analytics against a growing variety of internal and external – structured and unstructured – data sources in an attempt to uncover new variables and metrics that are better predictors of performance.

The EMC approach accommodates open technologies at every stage but EMC, VCE, VMware, and Pivotal products can help you get a big data analytics solution up and running more quickly and with additional functionality required for an enterprise environment.

Big Data Infrastructure

EMC Isilon NAS Storage Platform

EMC Isilon is a scale-out NAS storage platform with native multi-protocol support, including Hadoop, eliminates inefficient storage silos, provides consistent security, and speeds time to insights.

VCE Block Pre-Integrated Stack

Alternatively, VCE Block is pre-integrated stack that combines server, shared storage, network devices, virtualization, and management, to speed Hadoop deployments.

EMC Elastic Cloud Storage Appliance

EMC Elastic Cloud Storage Appliance is a powerful hyper scale geo-distributed object and HDFS storage platform for geo-scale analytics and Multi-Cloud API's to seamlessly connect to public clouds.

VMWare Big Data Extension of VMWare vSphere

And VMWare Big Data Extensions are an extension of VMware vSphere that enables you to deploy, run, and manage a virtual Hadoop cluster. Big Data Extensions enables the rapid deployment of Hadoop clusters on VMware vSphere. Big Data Extensions provides a simple deployment toolkit that can be accessed through VMware vCenter Server to deploy a highly available Hadoop cluster in minutes using the Big Data Extensions user interface.

Big Data Analytics

Pivotal Big Data Suite Integration

Pivotal Big Data Suite is an integration of Pivotal technologies with unlimited use of Pivotal HD to store all your data, accelerate processing, and increase the amount of data being analyzed and operationalized. Pivotal HD is a commercially-supported enterprise-ready, Hadoop distribution that ensures you can harness the massive data being driven by new apps, systems, machines and the torrent of customer sources.

With a rich and compliant Structured Query Language (SQL) dialect, Pivotal HAWQ® supports application portability and a large ecosystem of data analysis and data visualization tools such as SAS, Tableau and more. Analytic applications written over HAWQ are easily portable to other SQL compliant data engines, and vice versa. This prevents vendor lock-in for the enterprise and fosters innovation, while containing business risk. Pivotal HAWQ provides strong support for low-latency analytic SQL queries, coupled with massively parallel machine learning capabilities.

Pivotal Big Data Suite can be deployed as part of PaaS technologies, on-premise and in public clouds, in virtualized environments, on commodity hardware or delivered as an appliance.

Pivotal Big Data Suite portfolio is compatible with distributions of Open Data Platform (ODP) versions of Hadoop. All components are distributions of open source projects or are in the process of becoming open source projects.

Converged Infrastructure for Analytics

Big Data Applications

Pivotal Cloud Foundry is an industry-leading, enterprise platform-as-a-service solution, powered by Cloud Foundry. It delivers an always-available, turnkey experience for scaling and updating applications on the private cloud.

EMC Isilon NAS Storage Platform

Streamline application development, deployment and operation on a centrally-managed Platform-as-a-Service for public and private cloud. Streamline IT development with full visibility and control over your application lifecycle, provisioning, deployment, upgrades and security patches.

Accelerate time-to-value through automated deployment of analytic systems on virtualized infrastructure utilizing shared storage for immediate data access from all applications (I.e. No data copy operations to DAS). EMC built an extensible platform that allows fast integration of new analytic applications and platform components, from ingest, indexing and data security applications. We support 3rd party and open source applications so your business can run analytics its own way.

Big Data Business Model Maturity Index

Big Data Business Model Maturity Index)

Bill Schmarzo developed a maturity model to help businesses understand where they are with big data proficiency. Businesses can use this to identify the transformational changes they need to make in order to gain big data capabilities, operationalize them, and use them to drive new types of value for IT and the lines of business.

Many organizations today find themselves in within the first two phases.

In the first phase, Business Monitoring, an organization deploys business intelligence to monitor current business performance. This is often a “rear view mirror” approach of reporting on the past.

In phase 2, Business Insights, organizations leverage predictive analytics to uncover actionable insights that can be integrated into existing reports and dashboards.

Phase 3, Business Optimization, is where organizations embed predictive analytics into existing business processes to optimize select business operations. This is a pivot point where the mirror begins to look toward the future and starts to drive business opportunities.

Phase 4, Data Monetization is reached when organizations creates new revenue opportunities, such as 1) reselling data and analytics, 2) creating “intelligent” products, or 3) over-hauling the customer engagement experience.

Phase 5, Business Metamorphosis is achieved when organizations leverage customers’ usage patterns, product performance behaviors, and market trends to create entirely new business models.

Big Data Vision Workshop

Big Data Vision Workshop

The Big Data Vision Workshop from EMG Global Services seeks to align business and IT goals around big data, identify strategic opportunities for big data analytics, prioritize key use cases by assessing feasibility and business benefits, demonstrate the potential value using data science techniques, and recommend the appropriate analytics engagement and deployment roadmap.

Big Data Vision Workshop Deliverables
  1. Big Data Business Opportunities:These are a list of strategic initiatives with a gap analysis so that IT and the business groups involved can agree on a common goal.
  2. Business Value and Feasibility Assessment: This helps the group understand how specific big data use cases can contribute to the initiative as well as how difficult or costly they could be to execute.
  3. Advanced Analytic Illustrations: These results of our data science work show how the data sources and data analytics can come together to produce insight about the targeted initiative.
  4. User Experience Mockups: Show how the insights from a big data solution can be leverage by the business stakeholders and integrated into business apps that help drive data driven decisions around the initiative.
  5. Business Opportunities Prioritization: This report helps the customer understand and rationalize the big data use cases that will drive the biggest impact on advancing the initiative. It’s based on an analysis of implementation feasibility and business value.

Proof of Value Service

Proof of Value Service

For those who need to understand how their big data analytic use case will generate insight that turns into value, the Proof of Value service demonstrates the value of analytics and data science. The project will source and prepare the data relevant for a chosen use case, perform the statistical analysis, and then share final findings and analytical models. The Service generates a ‘minimum viable product’ app or process to demonstrate how business value can be created using the models and determine the ROI. EMC will recommend any necessary changes to people and process, document a business justification and provide a roadmap for implementation into a production environment.

Big Data Implementation Services

  • Implement a big data and analytics platform
  • Operationalize a big data analytics use case
Big Data Implementation Services

EMC services professionals will stand up a data lake architecture so it's ready to execute on the target use case.

The configuration will be customized to the customer environment and the particular data requirements for the analytics use case. EMC Global Services automates the data ingest and processing, develops appropriate data governance and security controls, and build the analytics application into a business process. The data lake platform will be ready to execute on countless future use cases such as: enhancing customer experience, improving marketing effectiveness, streamlining operations, or developing new products.

Reskill for Digital Transformation

Reskill for Digital Transformation

EMC offers a range of education services to help business leaders, aspiring big data practitioners, and seasoned data scientists increase their effectiveness with big data. We offer a 90-minute course for business leaders to develop a baseline understanding of data science and big data to help them identify opportunities and integrate big data into their business strategies.

For big data analytics practitioners and team leads, we have 1-day and 5-day courses that utilize industry specific examples to explore team development, data science concepts, analytic approaches, tools, and advanced methods and hands-on labs. We offer advanced-level 5-day courses for specific methods and tools with labs and EMC Proven Data Science Certification.

Finally, we offer technology focused training on the core elements of the Federation Business Data Lake including the Islion, Pivotal HD and ECS components.