Project monitoring

Once the Active Learning model completes its first build, the model rebuilds every twenty minutes to include coding decisions not in the most recent build. However, if Active Learning detects reviewer inactivity of at least five minutes, a build will take place before the twenty-minute threshold is reached. Admins have a number of ways to monitor the progress of an Active Learning project.

This page contains the following information:

Document rank distribution

The Document Rank Distribution is one of the monitoring charts in the Active Learning project homepage. This ranks each document in the model, including manually-selected documents, based on how it relates to the overall project. A relevance rank near zero indicates the model believes the document is more likely coded on the negative review field choice. On the other hand, a rank closer to 100 means the model believes a document is more likely to be coded on the positive review field choice. In the early stages of an Active Learning project, most documents will have a relevance rank score between 40 and 60 until the model begins training.

The review state of the documents are also overlaid on this distribution. Note that it is possible for a document coded on the positive choice to have a lower relevance ranking; this is because the rank is simply the model's prediction.

(Click to expand)

The dashboard reports documents reviewed from the Prioritized Review or Coverage Review queue, as well as documents coded outside of the queue. Admins will see the following colors on the chart:

  • Blue (Coded Positive Choice) - a document was coded on the positive choice review field.
  • Yellow (Coded Negative Choice) - a document was coded on the negative choice review field.
  • Purple (Not Reviewed) - the documents are within the project's scope, but have not yet been coded and are based on Relativity's predictions.
  • Green (Skipped) - a document was skipped.
  • Red (Suppressed Duplicate) - the documents are suppressed because their learning is taken care of by other textually similar documents.

    Note: When a full population is performed, all previously identified suppressed documents are marked as "Not Reviewed" in the Document Rank Distribution chart.

You can interact with the Document Rank Distribution Chart to hide the different categories of documents. You can easily view particular categories of documents that remain in the chart. For example, to hide the Not Reviewed documents, click Not Reviewed. Upon clicking, the bar chart will rescale for the remaining documents.

Monitoring document rank distribution

Use the rank distribution chart to understand the following:

  • The number of predicted, relevant documents that remain for review.
  • The agreement between reviewers and the Active Learning model.
  • The number of documents the model does not understand well.

As the model learns throughout the project life cycle, the Rank Distribution is expected to gravitate toward 0 or 100 depending on how documents are coded on the positive choice or negative choice. Each time an admin accesses this page - via a page refresh or from a different page - the latest data will reflect in the Project Home display.

Note: If a coding decision is updated on a document reviewed in the queue, it will not change to a manually selected document.

Prioritized review progress

The Prioritized Review Progress chart displays the effectiveness of the prioritized review queue's ability to locate the relevant documents by measuring the relevance rate. More specifically, the relevance rate measures the percentage of documents that were predicted to be relevant that were then confirmed as relevant by reviewers' coding decisions.

  • This chart only updates when documents are coded in the Prioritized Review queue.
  • Documents included in the Active Learning queue for index health are excluded from the relevance rate calculation.
  • If only manually-selected documents are coded, the Prioritized Review Progress chart won't display relevance rate data.
  • This measurement is not cumulative with regard to the entire document set.

(Click to expand)

The x-axis charts documents in groups of 200 which have been reviewed in the Prioritized Review queue. Data won't appear on the chart until at least 200 documents are coded. Once 200 documents are coded in prioritized review, relevance rate data appears on the chart. A new data point appears every time another 200 documents are coded.

The y-axis charts the relevance rate. This is the percentage of the highly ranked documents which are confirmed by the reviewer to be relevant. The Prioritized Review queue serves up a mixture of documents: 10% of documents are randomly selected; 20% of documents are chosen for scores "in the middle" of the review (in the 40 to 60 range) for index health; and the final 70% are the highest-ranking uncoded documents remaining in the project.

This means for every 200 documents, around 140 (70%) are chosen for being highly ranked. If you wanted to calculate the relevance rate manually, you would take Number of Highest Ranked Coded [Positive Choice] / Number of Highest Ranked. You can view these values in the Prioritized Review queue review summary on the Review Statistics tab. For more information, see Review Statistics.

Monitoring prioritized review progress

In the beginning of the project, the relevance rate may be low as the model learns the meaning of responsive. However, as reviewers code documents and the model learns, this rate will improve because the model becomes better at locating relevant documents. Documents might have active learning ranks in the 80-100 range, which typically corresponds to very high rates of responsiveness. It's not unusual to see 100% reported in the graph. However, as the review progresses, the highest-ranking documents that haven't yet been coded are going to have lower and lower rank scores.

Note: You may see a spike in the relevance rate if a large amount of new documents are added to the project, or if the definition of relevance changes during the course of the review.

Eventually, the relevance rate will plateau and decline. Declines in relevance rate indicate that the project is near completion since the model is serving up fewer relevant documents to reviewers. This indicates that you can move to the Elusion Test to validate completion. For more information, see Elusion Test.

Note: Documents used for Index health are not included in the Relevance Rate calculation.

Running a search on a classification index

You can run a search against a classification index to quickly return documents of a certain rank or within a range of ranks instead of having to run Update Ranks.

  • If you try running a search for the first time while the classification index is populating and building, the search will try to complete for five minutes. If the index hasn't finished building within this time, the search will fail. However, you can re-run the search after the index finishes.
  • If you've previously run the search, these results are cached. If you try re-running the search while the classification index is building, you'll see these old, cached results. Once the index build completes, the results are refreshed with the latest index build results.

To run the search:

  1. Navigate to the Documents tab.
  2. From the search bar, select the classification index associated with your Active Learning project.

    Note: The index you select must be associated with an Active Learning project that has been built (at least five documents coded with the positive designation and five coded with the negative designation).

  3. Using the next drop-down, select whether to search for Greater than or equal to, Less than or equal to, Between, or Is the rank value you enter.

  4. Click Search.

The Rank column displays rank results relevant to your search for the most recent index build. This differs from the CSR- <Project Name> Cat. Set::Category Rank field generated by Update Ranks, which stores old results until you manually re-run.

The rank scores are rounded to two decimal places. Note that these results are temporary, and you can't run the mass operations Sum, Tally, and Average on them.

(Click to expand)

The results in the Rank column are temporary, and you can't run the mass operations Sum, Tally, and Average on them.

Quality control

If you want to run a quality control check on reviewer's coding decisions, run a search for documents Greater than or equal to your rank cutoff. Then, filter on the Designation field to return documents within this rank that were coded on the negative designation.