De-Duplication | Practical Law

De-Duplication | Practical Law

De-Duplication

De-Duplication

Practical Law Glossary Item 8-521-0517 (Approx. 3 pages)

Glossary

De-Duplication

A process to identify files that are exactly or nearly identical so that counsel can exclude redundant documents from the universe of electronically stored information (ESI) to be reviewed (the review set). De-duplication is a type of data filtering that is part of the data processing phase, which occurs after counsel collect ESI and before they review it.
There are various de-duplication methods, including:
  • Exact de-duplication. Exact de-duplication is available for native ESI. Exact de-duplication involves extracting the digital fingerprints from ESI (for example, MD5 hash values) and flagging as duplicates those files that have matching fingerprints. Exact de-duplication can be applied:
    • within a single custodian's ESI (vertical de-duplication); or
    • across multiple custodians' ESI (horizontal de-duplication).
  • Near de-duplication. Near de-duplication is available for both native and non-native ESI (such as hard copy documents scanned to PDF). Near de-duplication compares the content of the various documents and identifies those with similar content as near-duplicates. De-duplication programs often permit counsel to determine how similar content must be for the system to recognize the documents as near-duplicates (for example, 90% similar).
Because de-duplication effectively eliminates ESI from discovery, counsel should negotiate the de-duplication protocol with opposing counsel before implementing it to avoid later challenges to the e-discovery process.