Big Data workloads are classically thought of as having tens to thousands of Hadoop compute nodes all accessing a common HDFS store. There are different ways to construct that HDFS store, from distributed models where the compute nodes have DAS storage often managed by a common HDFS distribution or shared/centralized HDFS storage deployments when there are higher levels of services available. In either case, the distributed model of the compute layers dictate that the single most important requirement from the storage layer is low $/GB. That being said, what we are hearing from some customers is that as their Hadoop workloads scale up, the complexity of managing many resources can become challenging. For this reason, management at scale is increasingly becoming a factor. Finally, for those who are staying away from consolidated HDFS stores, but who need higher availability and manageability than DAS, they are looking for storage solutions with higher levels of services, even at scaled down sizes and cost. For these reasons, the loosely coupled scale-out architecture is the correct architecture choice for this type of workload.
In addition to archiving and batch analytics, customers are increasingly looking to their Big Data environment for real or near real time enterprise analytics. Most Hadoop environments, currently designed for batch analytics, have significant performance constraints for this increasingly common workload type. In addition, Hadoop requires 3 copies of data for reliability and performance, which ist very expensive for organizations that would like to leverage direct attached flash storage to boost the performance of their Big Data workloads. For customers looking to employ Hadoop for real or near real time Enterprise Analytics, Shared External NVM Fabrics is the architecture of choice as it provides superior performance, smaller data center footprint and a lower cost of ownership when compared to direct attached flash Hadoop environments.
The Spider Chart below shows the distribution and weighting of the primary workload requirements for this use case.