Finding Groups of "Textual Exact Duplicates"

Relativity Analytics is commonly used to set up near duplicate groups. Using a second Structured Analytics Set with different settings, you can set up a second relational field to define groups of exact duplicates- documents whose extracted text contains the same words (in the same order), and whose differences, if any, lie only in punctuation and/or whitespace. Such groupings can be used to look for coding inconsistencies, suppress duplicates from analysis, or speed coding by helping make decisions on multiple documents at once.

Recipe Overview

This recipe shows how to use textual near duplicate detection to further divide your documents into groups of textual exact duplicates.

Requirements

  • Structured Analytics
  • Relativity 9.5.196.102 or above

Special Considerations

You can run this analysis across all documents, including emails, or a subset of documents. For example, this subset of documents could just include certain types of documents such as Microsoft Word, PDF, and Text documents. Both setups are acceptable, and easy to work with.

Directions

  1. Create a field for your duplicate Group ID to be stored. It should be a fixed length text field which is relational:
    • Name: Textual Exact Duplicate Group
    • Field Type: Fixed-Length Text
    • Length: 255

    (Click to expand)

  2. Under Relational Field Properties, set the following fields:
    • Relational: Yes
    • Friendly Name: Text Duplicates
    • Import Behavior: Leave blank values unchanged
    • Pane icon: duplicates.png (you can also use the near duplicate icon, or something else entirely)
    • Order: 100
    • Relational View: Textual Near Duplicates Relational View (or create your own)

  3. Create a new Structured Analytics Set with the following properties:
    • Name: Text Exact Duplicates
    • Set prefix: X1
    • Select document set to analyze: choose a saved search that you want to run this analysis on
    • Select operations: Textual near duplicate identification

    (Click to expand)

  4. Under Textual Near Duplicate Identification settings, set the following fields:
    • Minimum similarity percentage: 100
    • Ignore numbers: No
    • Destination Textual Near Duplicate Group: Textual Exact Duplicate Group (the field you created in the previous step)

  5. Click Save.
  6. Click Run Structured Analytics. A pop up appears.
  7. Click Run. On a new set, it will always populate all documents.
  8. Add your textual exact duplicate groups to your views, saved searches, etc.

References

Textual Near Duplicate Identification