Elusion Test

An Elusion Test is used to validate the accuracy of an Active Learning project. The goal of the Elusion Test is to estimate how many low-ranked documents are actually highly relevant documents that you would leave behind if you stopped the project at that point. We recommend running the Elusion Test near the end of the project when you believe the project has stabilized and the low-ranking documents have an acceptably low relevance rate. However, you can run an Elusion Test at any point during the project.

When you run the Elusion Test, you specify either the number of documents in the sample or the confidence level, margin of error, and rank cutoff. The elusion sample is taken from the not coded documents below the specified rank cutoff. The not coded documents include documents never reviewed and documents that were skipped. Reviewers then code these documents on the same project review field to see what relevant documents remain, which ultimately result in elusion calculations.

The following are helpful definitions to better understand elusion calculations: 

  • Discard pile – the set of not coded documents with ranks below the rank cutoff. Reviewers in the Elusion Test are served a sample of documents from the discard pile.
  • Discard-pile elusion rate – the percentage of documents in the discard pile that are relevant. It’s not possible to calculate this number precisely (with zero error) without coding every document in the discard pile. Therefore, we use sampling to estimate the discard-pile elusion rate. Sampling results in a sample elusion rate along with a margin of error and confidence level, which capture the amount of uncertainty in the estimate. To calculate a more precise margin of error after a completed Elusion Test, see Calculating a post-test margin of error .
  • Sample elusion rate – the percentage of documents in the Elusion Test's sample that are relevant.

This page contains the following information:

Starting the Elusion Test

The Elusion Test appears along with the other review queues after a new project is created. Starting an Elusion Test disables all other active queues in the project and suspends model updates until the Elusion Test is completed.

To run an Elusion Test, complete the following:

  1. Click Add Reviewers on the Elusion Test and confirm you want to start an Elusion Test.

    Note: We recommend no more than 150 concurrent reviewers per project. Concurrent reviewers are defined as reviewers making coding decisions in an Active Learning queue. There is no limit to how many reviewers you can add to a queue as long as the number of concurrent reviewers remains at 150 or fewer.

  2. Wait for the system to set up the test. Once the queue reads Click to setup the Elusion Test, click the queue.
  3. On the Elusion Test set up window, complete the following fields:
    • Responsive Cutoff - the rank below which the Elusion Test will sample not coded, predicted not relevant documents (not reviewed, skipped, suppressed duplicates).
        Notes:
      • When you update the responsive cutoff value, the value is updated in all three places where it’s used in the application: Elusion Test, Update Ranks, and Project Settings.

      • Manually coded documents are not sampled for Elusion Tests because they are, by definition, coded.
    • Sample Type
      • Fixed - creates a random sample of fixed number of documents.
      • Statistical - creates a random sample set of a size that is based on a given Confidence and Margin of Error.
    • Confidence (%) - the probability that sample elusion ratesample is a good estimate of the discard pile elusion rate (i.e., within the margin of error). Selecting a higher confidence level requires a larger sample size.
    • Margin of Error (%) - the maximum difference between the sample elusion rate and the discard-pile elusion rate. Selecting a lower margin of error requires a larger sample size. Margin of error can change if documents were skipped in the Elusion Test.

      Note: The actual margin of error will often be lower than what's reported by the Elusion Test. To calculate a more precise margin of error after a completed Elusion Test, see Calculating a post-test margin of error .

    • Reviewers - the users that will review documents in the Elusion Test.
  4. Click the green check mark to accept your settings.
  5. Click Start Review.

Running an Elusion Test

Elusion Test statistics are reported in Review Statistics and updated during an Elusion Test. You can cancel an Elusion Test at any time. You can also pause a review by clicking the Pause Review button.

Reviewers access the queue from the document list like all other queues. Reviewers code documents from the sample until all documents have been served, at which point the following message appears:

    Notes:
  • For best results, we strongly recommend coding every document in the Elusion Test and avoiding skipping documents. Skipped documents are counted as relevant in Elusion Test results.
  • If a document was skipped during Prioritized Review or Coverage Review and is then served during the Elusion Test, the Review Statistics for that queue are also updated.

When a reviewer saves a document in the Elusion Test, the document is tagged in the <Project Name> Elusion Test multi-choice field.

Reviewing Elusion Test results

Once reviewers code all documents in the sample, you can access Elusion Test results by clicking View Elusion Test Results.

Based on the coding of the elusion test sample, the results display the following:

  • Elusion Rate - the percentage of documents coded relevant in the elusion sample. The elusion rate results are displayed as a range that applies the margin of error to the sample elusion rate, which is an estimate of the discard pile elusion rate. The rate is rounded to the nearest tenth of a percent.

    Note: Documents that are skipped during the Elusion Test queue are treated as relevant documents. Therefore, coding all of the documents in the elusion sample guarantees the statistical validity of the calculated elusion rate as an estimate of the entire discard-pile elusion rate.

  • Eluded Documents - the estimated number of eluded documents, calculated by multiplying the sample elusion rate by the number of documents in the discard pile. This number is subject to the final confidence and margin of error which can be found in review statistics.
  • Pending Documents - the number of documents that have not been submitted to the model, including documents in the elusion test sample and manually-selected documents coded while the elusion test was taking place.

If documents were skipped during the Elusion Test, a warning appears on the modal. You can review these skipped documents, and they'll be reflected in the results as if they were coded during the test. If these documents are coded after you click Complete Project, only the Pending Documents count is updated.

If you find the results of the Elusion Test acceptable, select whether to Update ranks upon completion, and then click Complete Project to close the project. Once the project is complete, the model remains frozen. Any coding decisions that occurred after the Elusion Test was administered will not be used to train the (now frozen) model.

Note: Updating ranks upon accepting Elusion Test results will use the Elusion Test Rank Cutoff.

If you don't find the results of the Elusion Test acceptable, click Resume Project, and then click again to re-open the Active Learning project. This unlocks the model, and allows it to rebuild. Any documents coded since the Elusion Test began, including those from the Elusion Test queue itself, are included in the model build.

Elusion Test statistics are reported in Review Statistics and persist after an Elusion Test is finished. This data is located under the Elusion Test tab.

See Review Statistics for more information.

Calculating a post-test margin of error

In most situations, the margin of error estimated by the Elusion Test may be too high (i.e. too conservative). More detail can be found in this article on the Community site. When the Elusion Test is completed, you can re-calculate the margin of error using a more precise formula.

This new "margin of error" may not be symmetric, and for that reason, it would be more accurate to call it a confidence interval, which provides a range on the elusion rate rather than a single percentage to be added and subtracted.

Assumptions

Except in rare cases, this new confidence interval will never be larger than that of the original margin of error. Below are five sets of assumptions (conditions) to ensure the post-test confidence interval is shorter than the pre-test confidence interval:

Pre-test margin of error Pre-test confidence level Documents in discard pile Population (discard pile) elusion rate
Greater than or equal to 0.01 Less than or equal to 0.99 1 million No restrictions
Greater than or equal to 0.01 Less than or equal to 0.99 Greater than or equal to 100,000 Less than 0.31
Greater than or equal to 0.01 Less than or equal to 0.99 Greater than or equal to 10,000 Less than 0.10
Greater than or equal to 0.01 Less than or equal to 0.95 Greater than or equal to 10,000 Less than 0.15
Greater than or equal to 0.01 Less than or equal to 0.90 Greater than or equal to 10,000 Less than 0.18

Calculating the confidence interval

To calculate the confidence interval, use the equation below or a tool such as this one from Epitools.

Where:

  • n = the number of documents coded and sampled in the Elusion Test.
  • p = the elusion rate from the completed Elusion Test
  • C = the desired confidence level
  • z = a statistical constant:
    • 1.64 if confidence level = 90%

    • 1.96 if confidence level = 95%

    • 2.57 if confidence level = 99%

Estimating recall

You can use either the Elusion Test margin of error or the confidence interval calculation to estimate recall using this recipe. If using the confidence interval, complete the following to calculate the high and low ends of eluded documents needed in the recipe, follow these steps:

  • To calculate the high end: Let x be the high end of the post-test confidence interval. This should be a number between 0 and 1, and it should be larger than your point estimate for the elusion rate. Multiply x by the number of documents in the discard pile and round the result up to the nearest integer. This rounded integer is the high end of the eluded documents (TOTAL_RESP_HIGH in the recipe).
  • To calculate the low end: Let y be the low end of the post-test confidence interval. This should be a number between 0 and 1, and it should be smaller than your point estimate for the elusion rate. Multiply y by the number of documents in the discard pile and round the result down to the nearest integer. This rounded integer is the low end of the eluded documents (TOTAL_RESP_LOW in the recipe).