aiR for Review prompt criteria validation
Prompt criteria validation gathers metrics to check whether the prompt criteria are effective and defensible before using them on a larger data set. Using aiR for Review and Review Center in tandem, you can set up a smaller document sample, oversee reviewers, and compare aiR's relevance predictions to actual coding results.
This functionality is currently only enabled for the Relevance and Relevance & Key analysis types.
See these related resources for more information:
Prerequisite
To run validation with aiR for Review, the Review Center application must be installed, and users must have the appropriate permissions to each application.
How validation fits into aiR for Review
aiR for Review leverages the prompt criteria across a three-phased workflow:
- Develop—user write and iterate on the prompt criteria and test on a small document set until aiR’s recommendations align sufficiently with expected relevance and issue classifications.
- Validate—user leverages the integration between aiR for Review and Review Center to compare results and validate the prompt criteria.
- Apply—user applies the verified prompt criteria on much larger sets of documents.
The prompt criteria validation process covers phase 2, Validate.
High-level prompt criteria validation workflow
The diagram details the steps (in orange) for prompt criteria validation, which can occur after the prompt criteria develop phase.
The Validate and Apply phases involve several steps that cross between aiR for Review and Review Center:
Process Flow |
Application Used |
1. Identify target review set & choose validation sample settings.
Set up the validation sample by choosing the sample size, desired margin of error for the validation statistics, and other settings. See Setting up aiR for Review prompt criteria validation for details.
2. Run aiR for Review on the sample to receive predictions.
When a validation sample is created, a corresponding Review Center queue is automatically created for their review. Run the sample documents through aiR for Review to obtain the relevance predictions that will be used for comparison with the manual human reviews in Review Center. See Applying the prompt criteria for details.
|
aiR for Review
|
3. Manually review and code the documents in the validation sample.
Human reviewers code the documents in the sample for comparison to the aiR predictions. For the sake of validation, the human coding decisions are considered "correct." The Review Center dashboard tracks reviewers’ progress and compares their choices to aiR. See Prompt Criteria validation in Review Center for details.
4. Evaluate built-in statistical results of the validation.
After human reviewers finish reviewing the validation sample, final validation statistics display comparing their results with aiR for Review’s AI predictions. These results are then evaluated. See Reviewing validation statistics and Prompt Criteria validation statistics for details.
5. Accept or reject the validation results.
After reviewing the results, decide whether to accept or reject them. If the results are accepted, the validated prompt criteria can be used for all documents in the target data source, and the process moves to aiR for Review to apply the criteria to the larger data set. If they are rejected, go back to the iteration phase and adjust the prompt criteria on additional document samples. See Accepting or rejecting results for details.
|
Review Center
|
6. Optionally apply or improve the prompt criteria.
Return to aiR for Review to either run the validated prompt criteria on larger sets of documents, or to improve the prompt criteria and validate again. See Applying the validated results or developing different prompt criteria for details.
|
aiR for Review
|