For Relativity to automatically measure the stability of the project (precision, recall, and F1 score), it is necessary to set aside a control set of documents that will not be utilized for training the system. Control sets (also known as truth sets) represent a random unbiased, statistically significant sample from the entire project universe. They can be thought of as a miniature model of an Assisted Review project.
It is recommended that you create a control set round as early as possible in the project (ideally as the very first round). The later you run one, the fewer eligible documents you will have. The documents contained within a control set sample are not used as examples, but they are categorized by reviewers. These specially treated documents allow an admin to track an Assisted Review project’s status by observing how accurately the documents in the control set are categorized, which in return will offer insights into the accuracy of the entire project.
The Control Set Statistics Report is what is used to track a project’s accuracy trends from round to round. The report displays precision, recall, and F1 scores for each round in a single line chart.
- Precision denotes the accuracy of the positive result documents (e.g., Responsive designation) which were categorized by the system.
- Recall indicates how many of the total positive result documents (e.g., Responsive designation) were identified by the system.
- The F1 score is the harmonic mean (a weighted average) of precision and recall.
Ideally, these scores should increase from round to round, and get to as close to 100% as is practical or possible for the project.
This page contains the following sections:
- Executing a control set round
- Reviewing documents for a control set round
- Finishing a control set round
- Reviewing Assisted Review reports after a control set round
- If you add new documents to the project, you invalidate the current control set. You must create a new one for Relativity to accurately determine precision and recall.
- If you have a large data set with low richness (e.g., very few responsive docs), you may need a larger control set
- Documents categorized prior to being put in the control set can't create overturns.
- Previously coded documents are not eligible to be added to a control set.
To execute a control set round:
- Click Start Round on the console.
- Select Control set as the Round Type.
Note that if you create an additional control set later in the project:
- The new set replaces the previous one.
- Any documents from the inactive control set that are coded are eligible to be used in a Pre-coded seed round.
- Documents from the inactive control set that are not coded are eligible for a Training or QC round or they may be included in your new Control set round.
- Save a copy of the previously generated Control Set Statistics report before you start the new control set round if you wish to compare the two.
- For the Saved search for sampling, it is recommended that you select the saved search used in the Documents to be categorized field on the project. This ensures that you get a random sample of documents from your entire project. It's recommended that uncategorizable documents and unclustered documents not appear in this search.
- Specify your desired Sampling Methodology settings.
Note: The fields in the Sampling Methodology section are defaulted to the values on the project settings.
- Statistical sampling- creates a sample set based on statistical sample calculations, which determines how many documents your reviewers need to code in order to get results that reflect the project universe as precisely as needed. Selecting this option makes the Margin of error field required.
- Confidence level - the probability that the rate in the sample is a good measure of the rate in the project universe. This is used in the round to calculate the overturn range as well as the sample size, if you use statistical sampling.
- Margin of error - the predicted difference between the observed rate in the sample and the true rate in the project universe. This is used in the round to calculate the overturn range as well as the sample size, if you use statistical sampling.
- Percentage - creates a sample set based on a specific percentage of documents from the project universe. Selecting this option makes the Sampling percentage field required.
- Sampling percentage - the percentage of the eligible sample population used to create the sample size.
- Fixed sample size - creates a sample set based on a specific number of documents from the project universe. Selecting this option makes the second Fixed sample size field required.
- Fixed sample size - the number of documents you want to include in your sample size.
Clicking Calculate sample displays the number of documents in the saved search selected for the round and the number of documents in the sample. If the values for the sample and/or saved search are unexpected, you can change any setting in the Start Round layout and re-calculate before clicking Go. You can't calculate the sample if you don't have access to the saved search selected for the round.
- Automatically create batches - determines whether or not a batch set and batches are automatically created for this round's sample set. By default, this field is set to whatever value was specified in the project settings. Once the sample set has been created, you can view and edit the corresponding batch set in the Batch Sets tab.
- Maximum batch size - the maximum number of documents that the automatically-created batches will contain. This is required if the Automatically create batches field above is set to Yes. This value must be greater than zero or an error appears when you attempt to save the project. The batch set and batches created from this project are editable after you create the project. By default, this field is set to whatever value was specified in the project settings.
When reviewing documents for a control set round, you are only considering the responsiveness of the document, not setting documents as good examples. If you do not code documents included in the control set sample, those documents are eligible to be included in a subsequent control set round's sample set. See Sample-Based Learning document review for more information on protocol for assigning documents out and reviewing documents during a round.
Note: If you're done using a project, it's better for workspace performance if you finish the round rather than leaving it in a status of either Review in Progress or Review complete.
Once all of the documents in the sample set have been coded, you should finish the round. You also have the option of finishing a round before all of the sample set documents have been coded.
Note: When you finish a control set round, if any documents are excerpted, marked as examples or un-coded, a warning appears on the Finish Round layout. If you continue to finish the round without manually removing the examples or excerpted text or coding the remaining documents, these documents are removed from the control set for reporting purposes.
If you need to find documents that were removed from a control set round, you can filter on or search for documents where the RAR Sample Set is the control set round name and Use as Example is set to True or the Designation excerpt field contains text. You can re-include the documents in the control set at a later time by switching the Use as Example field to No or removing the text excerpt.
Also, any documents in an active control set which are not coded are NOT eligible for sampling in subsequent rounds (except for a new control set round).
To finish a control set round:
- Click Finish Round on the console.
The Finish Round pop-up displays.
- Click Go on the Finish Round pop-up to finish the round.
Note: If the control set is the first round of the project, reports aren't available when you finish that round; reports are only available after you finish your first non-control set round.
If this round is done, it should be the first round.
The following reports should be reviewed after a control set round:
- Control Set Statistics report - provides the richness rate (total documents coded responsive) and the totals of how many coded for both responsive and not responsive documents. See Control Set Statistics report.