A Data Lake is scale-out storage for data consolidation. It allows for Big Data accessibility via traditional and next generation access methods to enable in-place analytics.
Why is a Data Lake a component of Big Data?
The advantage of a data lake in the realm of big data is that it puts all your information in one central location, which allows you to run analytics and build applications that leverage the data efficiently. If you are using big data but do not utilize a data lake, you have to pull data from disparate sources, and then combine them in order to get true insight. The process of aggregating the data without a data lake can add a lot more time to your analysis window and overhead on your business critical systems. The data lake lets you gather insight from one silo rather than many while offloading/reducing the reporting/analysis overhead from your critical systems.
Data Warehouse vs. Data Lake?
In the industry, a data warehouse is what we call an Online Analytical Processing (OLAP) database for transactional data. While the data lake should include data from within a data warehouse, the data lake itself is not a database; a data lake includes a larger, more diverse and accessible data set. A data lake will minimize the processing of a data warehouse while still interacting with it.
What are Data Lake Extensions?
Data lakes need to do more than just store data—they need to provide protection for the data and be able to accelerate access to the data. EMC data lake extensions for big data environments ensure that queries and transactions are handled correctly, that data remains protected, and that analytics and applications are delivered in a fast, reliable manner.