Email threading and near dupe - workflow alternatives

Relativity workspaces often have a need for email threading and document near duplicate identification System admins need to decide whether both functions should be addressed together or independently.

Recipe overview

This recipe assists you in making an informed decision that is right for your workflow. This includes suggested workflows for Email Threading and Near Duplicate Identification with examples of how to utilize applicable Relativity functions.


  • Applicable to Relativity 8.0 or above
  • Workspace & system admin rights


The following sections contain three workflow alternatives to help system admins who decide to run a Structured Analytics Set (SAS) involving Email threading and/or Text Near Dupe on a data set.

Running the Email Threading & the Near Dupe in a single Structured Analytics Set

Use this workflow to identify email duplicate spares independent of the duplicate loose documents/attachments. In other words, use this where the case demands that you handle the workflow for emails and loose files separately. If both options are selected on a single set, Analytics will only run near duplicate identification against non-emails (attachments and loose files).

Once you run the SAS, you can create a document view or a batch of documents that contain the documents flagged inclusive (Inclusive Email = Yes) and marked as non duplicate email (Email Duplicate Spare = No) for email documents.

You can then identify the loose documents as the documents and review them within their groups while keeping in mind that some group members may be email attachments.

Running only Email Threading in an Structured Analytics Set

This workflow works well if you want to perform email threading and duplicate detection on an email only document set. For example:

  • A case where emails are the only important documents. The case team wants to see only the most inclusive, non-duplicate emails. Running a SAS with just email threading would be best.

Running only Near Dupe in an Structured Analytics Set

Use this workflow to identify duplicates whether they are emails, attachments, or loose files. For example:

  • Two parties in discovery produce documents. There is an agreement that each is to review documents that were outside the set provided to the opposite side. In this case, the workflow requires you treat all documents as loose documents and identify all near duplicate groups of single documents (just a Near Duplicate Principal and no other documents in the group). You can then isolate all such documents via a saved search, placed into a view or into review batches.
  • A case team must identify duplicates across two or more groups of produced documents. Running the Near Dupe process with a high minimum similarity would be ideal. The analysis can focus just on the groups of documents and find the duplicate copies.

Multiple Structured Analytics Sets

Use Relativity to choose more than one of the above options. You might choose to drive your primary review workflow using combined Email Threading and Near Duplicate Detection in a single run. You may then want to focus on a subset of documents and run just Textual Near Duplicate detection on that subset, perhaps for Quality Control purposes. To avoid data collisions, the second Structured Analytics Set should be configured to write its Near Duplicate Groups to a different relational field than the first one uses.