Global Sales Contact List

Contact   A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
EMC Glossary

Backup Throughput

Backup throughput measures the speed at which a solution can ingest data, usually measured in megabytes per second (MB/s) or terabytes per hour (TB/hr).

Deduplication effect on throughput

In general, disk-based backup solutions with deduplication provide faster restore throughput than tape as disk is online with random access. However, backup throughput varies based on the vendor because data deduplication is a resource-intensive process.

During writes, the deduplication process determines if a small data sequence has been stored before, sometimes up to prior petabytes of data. A simple index of this data is too big to fit in random access memory (RAM) unless it's a very small deployment. Many solutions need to seek on disk, and disk seeks are notoriously slow and not getting better.

The easiest ways to make data deduplication go fast are to be worse at data reduction, looking only for big sequences, so you don't have to perform disk seeks as frequently, and to add more hardware so there are more disks across which to spread the load. Both have the unfortunate side effect of raising the system price so that it becomes less attractive against tape from a cost perspective.

Vendors vary in their approaches, but we took a unique approach with EMC Data Domain systems, which leverage a central processing unit (CPU)-centric architecture to quickly and efficiently identify redundant data, enabling industry-leading throughput.

CPU vs. disk-centric (spindle-bound) throughput

Unlike EMC, many vendors leverage a disk-centric approach to deduplication. However, since disk drives are the slowest component in any storage system, in order to get greater performance it's common to stripe data across a large number of drives so they work in parallel to handle I/O.

If your system uses this method to reach performance requirements, consider the right balance between performance and capacity. This is important as the point of data deduplication is to reduce the number of disk drives.

With EMC Data Domain Stream Informed Segment Layout—an inline, CPU-centric approach—very few disk drives are needed to reach maximum performance so its deduplication delivers on the expectation of a smaller storage footprint.

Single-stream backup and restore throughput

Single-stream performance indicates how fast a given file or database can be written, read, or copied to tape for long-term retention.

Due to backup windows for critical data, backup throughput is what most people ask about though restore time is more significant for most service level agreements (SLAs).

Aggregate backup/restore throughput per system

With multiple streams, how fast can a given system ingest or recover data? This will help gauge the number of controllers or systems are needed for deployment.

Connect with EMCConnect with EMC
Need help immediately? EMC Sales Specialists are standing by to answer your questions real time.
Use Live Chat for fast, direct access to EMC Customer Service Professionals to resolve your support questions.
Explore and compare EMC products in the EMC Store, and get a price quote from EMC or an EMC partner.
We're here to help. Send us your sales inquiry and an EMC Sales Specialist will get back to you within one business day.
Want to talk? Call us to speak with an EMC Sales Specialist live.