Sample-Based Learning

Sample-Based Learning uses a repetitive process to group and code documents. The process takes a small group of manually-coded documents and treats them as a representation of the entire document set. Based on the text in that group of documents, Sample-Based Learning categorizes all the documents in your workspace.

The following diagram outlines the basic Sample-Based Learning workflow.

(Click to expand)

Common project use cases

How you proceed with your project depends on your case and the risks involved in the production of privileged or non-responsive material. Using the Assisted Review layout, reviewers can validate the system-categorized values.

Generally, cases fall into one of the common scenarios included in this section. Note that these scenarios represent suggested workflows and only provide an overview of the process. If you need assistance with a specific matter, please contact

Scenario 1: Review prioritization

In this scenario, attorneys may want to review the entire document population. The goal is to get the most important documents to the review team as soon as possible. The remaining documents will still be reviewed, but perhaps later by a review team at a lower billing rate. This process can be used to determine resources after a couple of rounds. Prioritization projects typically don't require as many rounds as other types of projects, because all documents are eventually reviewed.

Scenario 2: Review all responsive items

In this scenario, the review team manually reviews all responsive documents but trusts the system based on acceptable error rates for the non-responsive population. The non-responsive documents are set aside and aren't reviewed. Privilege is not a major concern for this group. Using search terms across responsive items for privilege is an acceptable method of privilege review.

Scenario 3: Quick production

In this scenario, documents need to be produced in a very short time frame. It isn't a strong concern whether the production is over-inclusive, meaning it can include a few non-responsive items. In addition, privilege screening isn't typically a major concern for this scenario.

The basic goal of this approach is to achieve a low uncategorized percentage along with a low estimated defect percentage before finalizing the project and proceeding to an accelerated production.

Scenario 4: Identify the opposition productions’ most relevant documents

When the other side of a litigation produces documents to you, there is an inclination to presumptively treat the entire production as responsive. As such, Assisted Review projects of this nature are designed to locate the documents that are most beneficial to your case.

Scenario 5: QC a document set prior to production

In this scenario, the project manager leverages the technology to assist with QC of an existing manual review project. It’s a conservative and very useful method to learn if any documents have been missed or coded inconsistently.

Sample-Based Learning workflow

The following sections outline everything you need to get started with Sample-Based Learning:

  1. Set project goals
  2. Perform Sample-Based Learning setup
  3. Prepare your reviewers
  4. Perform Sample-Based rounds
  5. Complete your Sample-Based Learning project (stabilization)

Set project goals

Before you begin a Sample-Based Learning project, we suggest you think about whether the project is a good fit for Sample-Based Learning. Sample-Based Learning is centered on the concept of training the system so that it learns how to interpret uncategorized documents. The system learns best from documents that are good examples. To be good examples, documents should have rich text with lots of concepts, not just numbers.

Consider also what constitutes a responsive document. If, for instance, responsiveness hinges on a name or a date, that is likely not enough for Sample-Based Learning because there are no concepts to learn, only absolutes. Successfully completing a Sample-Based Learning project requires you to spend a little time at the beginning to determine whether Sample-Based Learning is the best way to proceed.

Every Sample-Based Learning project has specific needs, goals and deliverables. This checklist is meant to be customizable to fit the needs of each project, but useful as a guide to the most commonly required items.

  1. Ensure the document set you plan to use is a good population for Sample-Based Learning:
    • Minimum 50k records with text
    • Concept rich files (not primarily numbers)
    • Issue or privilege coding is in a separate field or not part of Sample-Based Learning workflow
  2. Make sure your timeline and goals are set. The stakeholders should discuss goals and timelines prior to beginning a Sample-Based Learning project so that clear deliverables are established.
    • Level of Precision, Recall, and F1 determined
    • Manual review plan decided (i.e., all docs categorized as Responsive; privilege screen only)
    • Production plan in place

Perform Sample-Based Learning setup

  1. First set up your environment for the Sample-Based Learning project. See Environment setup .
  1. Set the Tab Visibility workspace security permission for Sample-Based Learning. See Workspace security for more information.
  1. Next, set up your Sample-Based Learning workspace. See Workspace setup .

Prepare your reviewers

Make sure your reviewers are prepared. Reviewer preparation is key to success. A Sample-Based Learning project is not like other document coding attorneys may have done, so use all the tools available to be sure everyone is trained in Sample-Based Learning protocols.

  1. Sample-Based Learning for End Users webinar has been viewed.
  2. Sample-Based Learning Reviewer Protocol has been distributed and discussed.

Perform Sample-Based rounds

  1. Create the Sample-Based Learning project based on the goals you've set.
  2. (Optional) Control set round - Identify a representative sample group of documents as your control set and have reviewers code these documents.
  3. A control set is used to automatically determine precision and recall and F1 for your project using Sample-Based Learning reporting.

  1. Training round - Identify a sample group of documents in your data set to train the system with, and assign this to reviewers to code this training sample group and set the example documents.
  2. Note: Alternatively, if reviewers have already coded representative documents per Sample-Based Learning protocol, you can use the group of documents as a pre-coded seed round to train the system with.

  3. Submit the round sample documents to the system by finishing the round in order to categorize your entire data set.
  4. Each document in your searchable set is categorized based on the closest example document.

    Note: You may repeat the prior two steps until the system has categorized a certain desired percentage of documents.

  1. QC round - Sample a group of documents categorized by the system by creating a QC round, and then have reviewers review and code this sample set of documents to quality control (QC) the system.
  2. Before finishing the QC round, perform overturn analysis using Sample-Based Learning reporting to find seed documents that created the most overturns. Work with reviewers to ensure that the seed documents are correctly defined. After making fixes, finish the round.
  3. Note: Throughout the process, analyze your progress using the Sample-Based Learning reporting, and then verify whether you’re finished with the process or need to complete another iteration.

  4. Continue this process until the project reaches a stable point as determined from your goals and reporting.

Note: Sample-Based Learning is centered on the concept of training the system. Sample-Based Learning trains the system by learning how to interpret uncategorized documents. This equips the system to successfully categorize others with a high level of confidence and accuracy. Reviewers continue to train the system by manually reviewing documents and assigning them to categories.

Complete your Sample-Based Learning project (stabilization)

Planning in advance will ensure a successful wrap up. Ensuring that all tasks are complete is important for the client’s satisfaction as well as defensibility. The following should be satisfied before you can consider a project complete.

  • Project goals met
  • Precision/recall
  • Stabilization achieved
  • Manual review under way
  • Production complete

Once you reach your goal. you can continue to the next phase of review. After your project reaches stabilization and the overturn rate percentage of change in responsiveness stabilizes, you can take the values determined by Sample-Based Learning to proceed towards production or organization of documents for case work. This is the time to start creating these document groupings. The path you take is dependent on your project goals.

Consider the following post-project completion tasks:

  • Executing searches to find responsive documents and include family items
  • Manually reviewing documents that didn’t get a categorization value and aren’t part of the responsive family group
  • Reviewing responsive items for privilege
  • Spot-checking non-responsive items
  • Organizing case files around relevance
  • Creating witness binders around issues