Active Learning

Active Learning is a technology assisted review tool that helps you quickly organize your data and predict which documents are most likely to be relevant to reviewers. With very little training needed to see documents of interest, Active Learning can be used for cases of all sizes, even those as small as 1,000 documents. By using Active Learning, you can reduce the total time to review.

Active Learning works by using a technology called Support Vector Machine learning to continuously learn from your reviewers' coding decisions. Reviewers code documents using a binary classification system (for example, Relevant and Not Relevant). These coding decisions are ingested by the Active Learning model where machine learning takes place. As reviewers code, the model gets better at discerning what makes a document Responsive or Not Responsive and serves the best documents to reviewers. Active Learning provides two methods of review, making it flexible to your case needs: 

  • Prioritized Review - finds the documents most likely to be relevant to reviewers.
  • Coverage Review - quickly separates your document into your two categories.

There are a number of tools available to help you monitor the progress of your review and eventually validate the success of Active Learning.

Note: For details on estimating recall in an Active Learning project that has used the prioritized review queue, see Estimating recall in an Active Learning project.

This page contains the following information:

See these related pages:

Basic Active Learning workflow

The following graphic and corresponding steps depict a typical Active Learning workflow that integrates with other Analytics features. Note that each user's workflow may vary. You may not be required to follow all of these steps for every Active Learning project you run.

  1. Cull the documents you plan on using in the data source for your Active Learning project. This may include:
    •  Running the following structured analytics operations:
      • Email Threading
      • Textual Near Duplicate Identification
      • (optional) Language Identification
    • Removing large and non-text documents.
    • Removing documents that are out-of-scope for your case.
  2. Create a saved search using the documents you identified.

    Note: Return only extracted text in the search.

  3. Create an Analytics classification index using this saved search as the data source.
  4. Create a review field with two choices available on the coding layout to be used by the reviewer group accessing the Active Learning project.
  5. Pre-code documents in the data source on the review field by taking a richness sample. Aim for a balance of documents coded on the positive designation and the negative designation. Pre-coding documents by estimating richness helps speed up the Active Learning process and gives you a starting metric to gauge the progress of your Active Learning project later.
  6. Create a new Active Learning project.
  7. Turn on the Prioritized Review queue or Coverage Review queue, depending on your workflow. Have reviewers begin coding documents in the review queue.
  8. Monitor the review queue.
  9. Run an Elusion Test to validate the results of your project.

Note: For description of useful Active Learning dashboards, see Active Learning Useful Dashboards.