Deduplication

In computing, data deduplication is a technique for eliminating duplicate copies of repeating data. Successful implementation of the technique can improve storage utilization, which may in turn lower capital expenditure by reducing the overall amount of storage media required to meet storage capacity needs. It can also be applied to network data transfers to reduce the number of bytes that must be sent. The deduplication process requires comparison of data ‘chunks’ (also known as ‘byte patterns’) which are unique, contiguous blocks of data. These chunks are identified and stored during a process of analysis, and compared to other chunks within existing data. Whenever a match occurs, the redundant chunk is replaced with a small reference that points to the stored chunk. Given that the same byte pattern may occur dozens, hundreds, or even thousands of times (the match frequency is dependent on the chunk size), the amount of data that must be stored or transferred can be greatly reduced.

References

Data deduplication - Wikipedia

https://en.wikipedia.org/wiki/Data_deduplication

🔗en.wikipedia.org
Data Deduplication Overview

Learn more about: Data Deduplication Overview

🔗Docs
What is data deduplication? | Definition from TechTarget

Data deduplication reduces storage costs and processing overhead. Explore the different methods and how it compares to other data reduction techniques.

🔗Storage
The Complete Data Deduplication Guide

Data deduplication enables your organization to handle large amounts of data in the most efficient way - making it invaluable in your data protection strategy

🔗veritas.com