Data Deduplication Rate
What you can expect
Numerous variables impact the deduplication rate you can expect to get in your environment. Redundancy varies by application, frequency of version capture, and retention policy. Significant variables include the rate of data change as fewer changes mean more data to deduplicate, the frequency of backups because more "fulls" makes the deduplication rate higher, the retention period as longer retention means more data to compare against, and the size of the data set since the more data, the more there is to deduplicate.
When comparing different approaches to deduplication, be sure to compare with a common baseline. For example, some backup software can offer deduplication, but simultaneously do incrementals-forever backup policies. For high-contrast comparison, they compare their deduplication effect against daily-full-backup policies with very long retention. EMC tends to characterize deduplication behaviors in a daily-incremental, weekly-full backup policy with one to four months of retention.
The deduplication technology approach and granularity of the deduplication process also affect compression rates. Data reduction techniques typically split each file into segments or chunks, and the segment size varies from vendor to vendor. If segment size is very large, then fewer segment matches will occur, resulting in smaller storage savings at lower compression rates. Likewise, the smaller the segment size, the more segment matches occur, resulting in greater storage savings.