Global Sales Contact List

Contact   A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
EMC Glossary

Data Deduplication

Data deduplication looks for redundancy of sequences of bytes across very large comparison windows. Sequences of data (over 8 KB long) are compared to the history of other such sequences. The first uniquely stored version of a sequence is referenced rather than stored again. This process is completely hidden from users and applications so the whole file is readable after it's written.

Who uses data deduplication, and why?

Deduplication is ideal for highly redundant operations like backup, which requires repeatedly copying and storing the same data set multiple times for recovery purposes over 30- to 90-day periods. As a result, enterprises of all sizes rely on backup and recovery with deduplication for fast, reliable, and cost-effective backup and recovery.

How data deduplication works

Deduplication segments an incoming data stream, uniquely identifies data segments, and then compares the segments to previously stored data. If the segment is unique, it's stored on disk. However, if an incoming data segment is a duplicate of what has already been stored, a reference is created to it and the segment isn't stored again.

For example, a file or volume that's backed up every week creates a significant amount of duplicate data. Deduplication algorithms analyze the data and store only the compressed, unique segments of a file. This process can provide an average of 10 to 30 times reduction in storage capacity requirements, with average backup retention policies on normal enterprise data. This means that companies can store 10 TB to 30 TB of backup data on 1 TB of physical disk capacity, which has huge economic benefits.

Benefits of data deduplication

Eliminating redundant data can significantly shrink storage requirements and improve bandwidth efficiency. Because primary storage has gotten cheaper over time, enterprises typically store many versions of the same information so that new workers can reuse previously done work. Some operations like backup store extremely redundant information.

Deduplication lowers storage costs as fewer disks are needed. It also improves disaster recovery since there's far less data to transfer. Backup and archive data usually includes a lot of duplicate data.

The same data is stored over and over again, consuming unnecessary storage space on disk or tape, electricity to power and cool the disk or tape drives, and bandwidth for replication. This creates a chain of cost and resource inefficiencies within the organization.

Data deduplication implementation

The ease of implementing deduplication can vary greatly by vendor. We've made it very easy to implement EMC Data Domain systems by creating an application-agnostic deduplication storage system, attachable as a file server over Ethernet, a virtual tape library (VTL) over Fibre Channel, or through advanced integration via EMC Data Domain Boost.

Data Domain systems support leading backup and archive applications, and deduplication is transparent to backup and archive processes. It integrates easily with various data movers and workloads including nonbackup data like email and file archives. More flexibility means more consolidation is possible using less physical infrastructure.

When selecting a deduplication solution, it's critical to ensure ease of integration to your existing environment, get customer references in your industry, and pilot the product or technology in your environment.

Connect with EMCConnect with EMC
Need help immediately? EMC Sales Specialists are standing by to answer your questions real time.
Use Live Chat for fast, direct access to EMC Customer Service Professionals to resolve your support questions.
Explore and compare EMC products in the EMC Store, and get a price quote from EMC or an EMC partner.
We're here to help. Send us your sales inquiry and an EMC Sales Specialist will get back to you within one business day.
Want to talk? Call us to speak with an EMC Sales Specialist live.