VMAX, XtremIO or Isilon


Data mining and analytics workloads are characterized by gaining predictive insight into emerging trends and behaviors. Companies often utilize these workloads to increase revenue and customer experience. A common example of data mining and analytics is Amazon's Price Checker app. Prospective customers and passive users from the shopper demographic use the Amazon app to competitive price check at brick and mortar stores. Whether the individual actually buys that product from Amazon or not, Amazon is able to gather valuable shopping behavior data to gain insight into their customers to hone their strategies.

Data mining and analytics workloads are often characterized by few users submitting complex, resource intensive requests against a large data volume. These databases, which are organized and optimized very differently from OLTP workloads, tend to be de-normalized with fewer tables and are organized to allow fast scanning and ingestion rather than fast random selection. Response time is a key measure rather than transactions per second. And, workloads, especially those requiring real or near-real time analytics or very high levels of compute for functions such as trade simulations or transaction modeling, increasingly require extreme storage performance providing very low levels of latency, combined with high levels of bandwidth density (bandwidth/TB of data).

As such, depending on the storage performance and data capacity needs, a tightly coupled scale-out or Shared External NVM Fabrics are the architectures of choice for these workloads.

Conversely, Hadoop batch analytics workloads are characterized differently as Hadoop relies on the HDFS file system rather than traditional block storage services. In case of HDFS, a loosely coupled scale out architecture is the architecture of choice for batch analytics, while a Shared External NVM Fabrics architecture would be the appropriate choice for real or near real time analytics on Hadoop.

The Spider Chart below shows the distribution and weighting of the primary workload requirements for this use case.