Project Validation statistics

Active Learning's Project Validation feature provides four metrics for evaluating your review coverage. Together, these metrics can help you determine the state of your Active Learning project. Once you have insight into the accuracy and completeness of your relevant document set, you can make an educated decision about whether to stop the Active Learning workflow or continue review.

For instructions on how to run Project Validation, see Project Validation and Elusion Testing.

This page contains the following information:

Defining elusion, recall, precision, and richness

Project Validation centers around four statistics, which are defined as follows:

  • Elusion rate - the percentage of documents coded relevant in the uncoded, low-ranking portion of the sample. The elusion rate results are displayed as a range that applies the margin of error to the sample elusion rate, which is an estimate of the discard pile elusion rate. The rate is rounded to the nearest hundredth of a percent.
  • Richness - the percentage of relevant documents across the whole sample. This is calculated by dividing the number of positive-coded documents in the sample by the total number of documents in the sample. This allows us to predict a richness range for the whole project.
  • Recall - the percentage of truly positive documents which were found by the Active Learning process. A document has been "found" if it was previously coded positive, or if it is uncoded with a rank at or above the cutoff.
  • Precision - the percentage of found documents which are truly positive. A document has been “found” if it was previously coded positive, or if it is uncoded with a rank at or above the cutoff. Documents which were predicted positive but coded negative during validation will count against precision.

Project Validation arrives at these values through its determination of true and false positives, as well as false negatives:

  • True Positive = coded relevant & predicted relevant by the system
  • False Positive = coded not relevant but predicted relevant by the system
  • False Negative = coded relevant but predicted not relevant by the system

Note: Project Validation does not check for human error. We recommend that you conduct your own quality checks to make sure reviewers are coding consistently. For more information, see Quality checks and checking for conflicts.

How Project Validation handles skipped documents

We strongly recommend coding every document in the Project Validation queue. Skipping documents lowers the randomness of the random sampling, which introduces bias into the validation statistics. To counter this, Active Learning gives conservative estimates. Each validation statistic counts a skipped document as an unwanted result.

The following table shows how skipped documents negatively affect each statistic.

Skipped Document Effect on Elusion Effect on Recall Effect on Richness Effect on Precision
Low-ranking

Increases elusion rate

(Counts as relevant)

Lowers recall rate

(Counts as non-relevant)

Raises richness estimate

(Counts as relevant)

(No effect)

High-ranking

(No effect)

Lowers recall rate slightly

(Counts as if it weren't present)

Raises richness estimate

(Counts as relevant)

Lowers precision rate

(Counts as non-relevant)

Documents skipped during Prioritized or Coverage Review

If a document was skipped during Prioritized Review or Coverage Review and is then served and coded during Project Validation, it will be designated as a coded document rather than a skipped document in that queue's Review Statistics tab.