How to handle Suppressed Duplicates at the end of your project

When you select suppress duplicate documents in your Active Learning project, the prioritized review queue will not return any documents that appear identical to queue documents. Note that from the point of view of the analytics index, these are duplicates- in fact, they may contain words in a different order, or differ based on things that do not matter to the engine, including: stop words, numbers, or email headers.

Once the review is done and you have conducted an elusion test, you may want to have reviewers code any remaining documents at or above your cutoff score, which will mostly be these suppressed documents. This recipe provides a way to find and review them most efficiently.

Overview

This recipe describes how to review suppressed documents after an elusion test has shown acceptable results.

Requirements

  • Relativity 9.5.370.136 and above

Directions

  1. Find the suppressed documents
    1. Relativity does not populate a field to indicate suppressed items. However, assuming your project is complete, and you have already coded all other high-ranking documents, it is straightforward to query for them. The idea of "already coding all other high-ranking documents" is equivalent to the fact that you do not see purple bars on the right side of your graph- mainly just red and blue. In this scenario, nearly all remaining high-ranking documents without coding decisions should be suppressed documents. Therefore, you can find suppressed documents using the following query: 

      CSR - <Active Learning Project> Cat. Set::Category Rank: is greater than <your cutoff minus 1>

      AND

      <designation field used in Active Learning project>: is not set

      Note: This query will only work if you have updated ranks from the Active Learning Project page. Also, we assume the ranks field is up-to-date, i.e. it was run recently. If you have accepted an elusion test, or have one active, this assumes you completed the update ranks after you started the elusion test.

    2. Save this query as a saved search named "Supressed high ranking." The screen shot below shows what the conditions would look like if your cutoff were 60.

  2. Group the suppressed documents.
    1. This is an optional, but highly recommended step. We will assume for the remainder of this recipe that you have used this step. First, run textual near duplicate identification on all documents in your active learning project, with a minimum similarity percentage of 90, and field in Relativity. We'll assume for our purposes that this field is called "Textual Near Duplicate Group" with a friend name, "Near Duplicates."
  3. Review the documents by Near Duplicate Group.
    1. At this point, you should review the documents by near duplicate group. One way to do this would be to take the search you created in step 1 and sort/group by Textual Near Duplicate Group. You could also set related items in the search to include Near Duplicates if you want to include already coded near duplicates as a suggestion to the reviewer.

Notes

There may be some shifting of the active learning model as you go, particularly if you are finding significant differences between the coding of documents in near duplicate groups. In this case, you will likely want to do one final search for high ranking, uncoded documents after your review is complete.