This page contains the following information:
Document rank distribution
The Document Rank Distribution is one of the monitoring charts in the Active Learning project homepage. This ranks each document in the model, including manually-selected documents, based on how it relates to the overall project. A relevance rank near zero indicates the model believes the document is more likely coded on the negative review field choice. On the other hand, a rank closer to 100 means the model believes a document is more likely to be coded on the positive review field choice. In the early stages of an Active Learning project, most documents will have a relevance rank score between 40 and 60 until the model begins training.
The review state of the documents are also overlaid on this distribution. Note that it is possible for a document coded on the positive choice to have a lower relevance ranking; this is because the rank is simply the model's prediction.
The dashboard reports documents reviewed from the Prioritized Review queue, as well as documents coded outside of the queue. Admins will see the following colors on the chart:
- Blue (Coded Positive Choice) - a document was coded on the positive choice review field.
- Yellow (Coded Negative Choice) - a document was coded on the negative choice review field.
- Purple (Not Reviewed) - the documents are within the project's scope, but have not yet been coded and are based on Relativity's predictions.
- Green (Skipped) - a document was skipped.
- Red (Suppressed Duplicate) - the documents are suppressed because their learning is taken care of by other textually similar documents.
Note: When a full population is performed, all previously identified suppressed documents are marked as "Not Reviewed" in the Document Rank Distribution chart.
Document rank distribution chart
You can interact with the Document Rank Distribution Chart to hide the different categories of documents. You can easily view particular categories of documents that remain in the chart. For example, to hide the Not Reviewed documents, click on the purple box to the right of Not Reviewed. Upon clicking, the bar chart will rescale for the remaining documents.
Monitoring document rank distribution
Use the rank distribution chart to understand the following:
- The number of predicted, relevant documents that remain for review.
- The agreement between reviewers and the Active Learning model.
- The number of documents the queue does not understand well.
As the model learns throughout the project life cycle, the Rank Distribution is expected to gravitate toward 0 or 100 depending on how documents are coded on the positive choice or negative choice. If a coding decision is updated on a Prioritized Review document, it will not change to a manually selected document. Each time an admin accesses this page - via a page refresh or from a different page - the latest data will reflect in the Project Home display.
Prioritized review progress
The Prioritized Review Progress chart displays the effectiveness of the prioritized review queue's ability to locate the relevant documents by measuring the relevance rate. More specifically, the relevance rate measures the percentage of documents that were predicted to be relevant that were then confirmed as relevant by reviewers' coding decisions.
Relevance rate is calculated every 200 documents for frequent feedback. Once 200 documents are coded in prioritized review, relevance rate data appears on the chart.
- This chart only updates when documents are coded in the Prioritized Review queue.
- Documents included in the Active Learning queue for index health are excluded from the relevance rate calculation.
- If only manually-selected documents are coded, the Prioritized Review Progress chart won't display relevance rate data.
Note: This measurement is not cumulative with regard to the entire document set.
Monitoring prioritized review progress
In the beginning of the project, the relevance rate may be low as the model learns the meaning of responsive. However, as reviewers code documents and the model learns, this rate will improve because the model becomes better at locating relevant documents. You may see a spike in the relevance rate if a large amount of new documents are added to the project, or if the definition of relevance changes during the course of the review. Eventually, this relevance rate will plateau and decline. Declines in relevance rate indicate that the project is near completion since the model is serving up fewer relevant documents to reviewers. This indicates that you can move to the Elusion Test to validate completion. For more information, see Elusion Test.
Note: Documents used for Index health are not included in the Relevance Rate calculation.
If you want to run a quality control check on reviewer's coding decisions, complete the following:
- Update ranks via the Update Ranks button on the Project Home page
- Create a saved search where the following conditions are met:
- Documents coded on the positive choice
- CSR - <Project Name> Cat. Set::Category Rank is less than the chosen cut-off rank
- Sample documents.
- Inspect documents for incorrect coding decisions.
You can also take a random sample of documents to determine the richness rate.