This page contains the following information:
Document rank distribution
The Document Rank Distribution is one of the monitoring charts in the Active Learning project homepage. This ranks each document in the model, including manually-selected documents, based on how it relates to the overall project. A relevance rank near zero indicates the model believes the document is more likely coded on the negative review field choice. On the other hand, a rank closer to 100 means the model believes a document is more likely to be coded on the positive review field choice. In the early stages of an Active Learning project, most documents will have a relevance rank score between 40 and 60 until the model begins training.
The review state of the documents are also overlaid on this distribution. Note that it is possible for a document coded on the positive choice to have a lower relevance ranking; this is because the rank is simply the model's prediction.
The dashboard reports documents reviewed from the Prioritized Review queue, as well as documents coded outside of the queue. Admins will see the following colors on the chart:
- Blue (Coded Positive Choice) - a document was coded on the positive choice review field.
- Yellow (Coded Negative Choice) - a document was coded on the negative choice review field.
- Purple (Not Reviewed) - the documents are within the project's scope, but have not yet been coded and are based on Relativity's predictions.
- Green (Skipped) - a document was skipped.
- Red (Suppressed Duplicate) - the documents are suppressed because their learning is taken care of by other textually similar documents.
Note: When a full population is performed, all previously identified suppressed documents are marked as "Not Reviewed" in the Document Rank Distribution chart.
Document rank distribution chart
You can interact with the Document Rank Distribution Chart to hide the different categories of documents. You can easily view particular categories of documents that remain in the chart. For example, to hide the Not Reviewed documents, click Not Reviewed. Upon clicking, the bar chart will rescale for the remaining documents.
Monitoring document rank distribution
Use the rank distribution chart to understand the following:
- The number of predicted, relevant documents that remain for review.
- The agreement between reviewers and the Active Learning model.
- The number of documents the model does not understand well.
As the model learns throughout the project life cycle, the Rank Distribution is expected to gravitate toward 0 or 100 depending on how documents are coded on the positive choice or negative choice. If a coding decision is updated on a Prioritized Review document, it will not change to a manually selected document. Each time an admin accesses this page - via a page refresh or from a different page - the latest data will reflect in the Project Home display.
Prioritized review progress
The Prioritized Review Progress chart displays the effectiveness of the prioritized review queue's ability to locate the relevant documents by measuring the relevance rate. More specifically, the relevance rate measures the percentage of documents that were predicted to be relevant that were then confirmed as relevant by reviewers' coding decisions.
Relevance rate is calculated every 200 documents for frequent feedback. Once 200 documents are coded in prioritized review, relevance rate data appears on the chart.
- This chart only updates when documents are coded in the Prioritized Review queue.
- Documents included in the Active Learning queue for index health are excluded from the relevance rate calculation.
- If only manually-selected documents are coded, the Prioritized Review Progress chart won't display relevance rate data.
Note: This measurement is not cumulative with regard to the entire document set.
Monitoring prioritized review progress
In the beginning of the project, the relevance rate may be low as the model learns the meaning of responsive. However, as reviewers code documents and the model learns, this rate will improve because the model becomes better at locating relevant documents. You may see a spike in the relevance rate if a large amount of new documents are added to the project, or if the definition of relevance changes during the course of the review. Eventually, this relevance rate will plateau and decline. Declines in relevance rate indicate that the project is near completion since the model is serving up fewer relevant documents to reviewers. This indicates that you can move to the Elusion Test to validate completion. For more information, see Elusion Test.
Note: Documents used for Index health are not included in the Relevance Rate calculation.
Running a search on a classification index
You can run a search against a classification index to quickly return documents of a certain rank or within a range of ranks instead of having to run Update Ranks.
- If you try running a search for the first time while the classification index is populating and building, the search will try to complete for five minutes. If the index hasn't finished building within this time, the search will fail. However, you can re-run the search after the index finishes.
- If you've previously run the search, these results are cached. If you try re-running the search while the classification index is building, you'll see these old, cached results. Once the index build completes, the results are refreshed with the latest index build results.
To run the search:
- Navigate to the Documents tab.
- From the search bar, select the classification index associated with your Active Learning project.
Note: The index you select must be associated with an Active Learning project that has been built (at least five documents coded with the positive designation and five coded with the negative designation).
- Using the next drop-down, select whether to search for Greater than or equal to, Less than or equal to, Between, or Is the rank value you enter.
- Click Search.
The Rank column displays rank results relevant to your search for the most recent index build. This differs from the CSR- <Project Name> Cat. Set::Category Rank field generated by Update Ranks, which stores old results until you manually re-run.
The rank scores are rounded to two decimal places. Note that these results are temporary, and you can't run the mass operations Sum, Tally, and Average on them.
(Click to expand)
The results in the Rank column are temporary, and you can't run the mass operations Sum, Tally, and Average on them.
If you want to run a quality control check on reviewer's coding decisions, run a search for documents Greater than or equal to your rank cutoff. Then, filter on the Designation field to return documents within this rank that were coded on the negative designation.