This recipe is intended as a reference for reviewers in a Assisted Review project. We’ve included some best practices and special considerations that are unique to a computer-assisted review.
Note: This recipe is applicable to both Sample Based and Active Learning projects. To see information only applicable to Sample Based Learning, see Sample Based Learning.
What is Assisted Review?
Computer-assisted review is a workflow that captures and analyzes reviewers’ coding decisions and amplifies them across a data set. Based on user input, the system learns how to categorize conceptually similar documents in the document universe.
Selecting good example documents
Because all machine learning is derived from text, it is important to note that some documents may be highly responsive but undesirable as example documents for an Assisted Review project.
In order for a document be considered a good example for machine learning, it must contain a sufficient quantity of text to train the system. Assisted Review’s text analytics engine learns from concepts, rather than individual words or short phrases. It’s best to think in terms of sentences or paragraphs rather than a few words when deciding if a document has sufficient conceptual language. For that reason, very short documents that contain only a few words or phrases are typically not good examples.
Email headers and repeated content
Email headers, confidentiality footers, and other types of repeated content are typically filtered out prior to review. They should not be considered when determining whether a document is a good example for a computer-assisted review.
Consider the following document:
In this example, the system will only learn from the text that is not framed by a red box. Even if the subject line and short sentence fragment that remain are responsive, there is not enough text to warrant this document’s inclusion as an example.
Consider the following example from the Enron data set. This document, which appears to contain useful text, is actually a JPEG image of a paper document:
A reviewer might read the above language and find the document to be responsive. However, when we switch to the extracted text of the document we see the following:
Because the system only works with a document’s extracted text, all of the responsive text located in the image will be unavailable for machine learning. Consequently, this document, while highly responsive, turns out to be a poor example document.
As mentioned above, numbers are not considered in the machine learning process. It follows that spreadsheets consisting largely of numbers, while potentially responsive, do not make good example documents.
Consider the two examples below. Both are spreadsheets, but only the second would make a good example document.
Families and the "Four Corners" test
A document is only a good example if there is text on the document’s face—within the four corners of the document—that makes it responsive.
The following scenarios violate the Four Corners Test, and will not offer good example documents:
- The document is a family member of another document which is responsive.
- The document comes from a custodian whose documents are presumptively responsive.
- The document was created within a date range which is presumptively responsive.
- The document comes from a location or repository where documents are typically responsive.
This issue is especially prevalent with regard to document families, so additional emphasis is warranted:
A reviewer should never include a document as an example based on the content of a family member.
For example, consider the following email. Note that it mentions an attachment. If the attachment is responsive, the reviewer might be tempted to include the email as a responsive example, too. Doing so would violate the Four Corners Test. Again, if there is no sufficient, responsive language on the document’s face, it should not be used as an example.
Handling good language / bad example exceptions
Quite frequently, a reviewer may encounter a document containing highly responsive language which should be coded as non-responsive for an external reason, such as a carve-out agreement between the litigating parties. For example, consider the draft contract fragment below:
Let’s assume that the parties have agreed to produce only final, executed contracts. This contract is a draft, which would make it non-responsive. However, this document contains a great deal of extremely responsive language, and to submit it as a non-responsive example would teach the system incorrectly. Consequently, the correct action here is to mark it as non-responsive, but not submit it as an example.
Applying the Use as Example field
Users select or reject documents as examples for the system via the Use as Example field. During categorization, Assisted Review will check the Use as Example field checkbox for all documents that are part of a project sample set.
If a document appears to be a good example for machine learning, simply code the designation field as you normally would and leave the Use as Example field checked.
If a document is known to be responsive but is not a good example, code the document as responsive and uncheck the Use as Example button.
Using the Excerpt text box
When a document is mostly non-responsive but contains examples of responsive text, use the Excerpt Text box. Highlight the responsive text, right click, and choose Add to Excerpt Text. Code the document as responsive.
Relativity pastes the selection into the Excerpt Text field on the Assisted Review coding layout. You can perform this action multiple times per document. Each time you select text and click Add to Excerpt Text, the text is automatically appended to the Excerpt Text field.
However, it is important to remember that good example language is measured in concepts (sentences and paragraphs) and not single words or short phrases. Do not use the Excerpt Text option for short keywords or phrases.
- Consistency is crucial.
- Consult fellow reviewers on difficult coding decisions to ensure unanimity.
- Don’t touch.
- Never add choices to the Designation tag.
- If you are unsure about a document or have a technical difficulty, consult with the project manager to identify workflow solutions.
- Double check.
- Always check the extracted text of a document to be sure it matches the content in other views. Whenever possible, review from the Extracted Text viewer.
- When in doubt, ask.
- If there is an aspect of the Assisted Review workflow that is confusing, do not guess. Ask a system admin or project manager about the proper course of action.
Optimal use of Assisted Review mandates careful adherence to the recommended workflow. Always consult with a system admin or review manager when confusion or problems arise. For additional support for Assisted Review, please contact firstname.lastname@example.org.