How to handle Suppressed Duplicates at the end of your project

When you select suppress duplicate documents in your Active Learning project, the prioritized review queue will not return any documents that appear identical to queue documents. Note that from the point of view of the analytics index, these are duplicates- in fact, they may contain words in a different order, or differ based on things that do not matter to the engine, including: stop words, numbers, or email headers.

Once the review is done and you have conducted an Elusion Test, you may want to have reviewers code any remaining documents at or above your cutoff score, which will mostly be these suppressed documents. This recipe provides a way to find and review them most efficiently.

Overview

This recipe describes how to review suppressed documents after an Elusion Test has shown acceptable results.

Requirements

  • 10.x.xxx.x and above

Directions

  1. Find the suppressed documents
    1. Relativity populates a field to indicate suppressed items. You can find suppressed items which are at or above your cutoff using the following query:

      (Index Search) <Classification Index Name> Rank is greater than or equal to <your cutoff>
      AND
      Classification Index any of these: <Classification Index Name> \ <Classification Index Name> - Suppressed Duplicate

    2. Save this query as a saved search named "Suppressed high ranking." The screen shot below shows what the conditions would look like if your cutoff were 51.

  2. Group the suppressed documents.
    • This is an optional, but highly recommended step. We will assume for the remainder of this recipe that you have used this step. First, run textual near duplicate identification on all documents in your active learning project, with a minimum similarity percentage of 90, and field in Relativity. We'll assume for our purposes that this field is called "Textual Near Duplicate Group" with a friend name, "Near Duplicates."
  3. Review the documents by Near Duplicate Group.
    • At this point, you should review the documents by near duplicate group. One way to do this would be to take the search you created in step 1 and sort/group by Textual Near Duplicate Group. You could also set related items in the search to include Near Duplicates if you want to include already coded near duplicates as a suggestion to the reviewer.

Notes

There may be some shifting of the active learning model as you go, particularly if you are finding significant differences between the coding of documents in near duplicate groups. In this case, you will likely want to do one final search for high ranking, uncoded documents after your review is complete.