Training round

A training round is intended to teach Sample-Based Learning how to interpret more uncategorized documents so that it is more likely to successfully categorize others with a high level of confidence and accuracy.

This page contains the following sections:

Executing a training round

To execute a training round:

  1. Click Start Round on the console.
  2. Select Training as the Round Type. When you select this as the round type the Sampling type field defaults to Stratified sampling.
    Training round display
  3. Enter a Round description.
  4. For the Saved search for sampling, select the automatically created <Project Saved Search> search. This is the only search available and it contains only uncategorized documents because you haven't started the project at this point. See Viewing categorized and uncategorized documents for your Sample-Based Learning project for more information. After the initial training round has been completed, and categorization has run, subsequent training round saved searches should typically be set to uncategorized documents.
  5. Specify your desired Sampling Methodology settings. The sample set is the randomly-selected group of documents produced by to be used for manual review as a means of training the system. Stratified sampling is selected by default.
  6. Note: The fields in the Sampling Methodology section are defaulted to the values on the project settings; however, if you select Training as the round type, you override those default values.

    • Stratified sampling - groups the round saved search documents into subgroups based on the documents' concepts and returns the documents that cover the vast majority of the conceptual space or until the Maximum sample size or Minimum seed influence has been met. This type of sampling allows RAR to effectively train with as few documents as possible. Selecting this type makes the Maximum sample size and Minimum seed influence fields available and disables the Calculate sample button. The Stratified sampling option is only available when you select the Training round type.
      • Notes:
      • You can increase your categorization coverage of your conceptual space by running multiple stratified rounds.
      • If all documents in the sample were used as examples, they would categorize 80% of the documents
      • You can limit the sample by Maximum sample size or Minimum seed influence.
      • If you decrease the coherence value on the Designation Categorization Set associated with your RAR project, RAR will return fewer documents for a stratified sample and categorize more documents with each example.
      • You should still follow best practices for excerpting and selecting documents to use as examples when reviewing documents from a stratified sample.
      • Maximum sample size - the maximum number of documents you want returned in a stratified sample. For example, if you set the value at 500, the round will contain the 500 strongest documents (according to their seed influence). The stratified sample is created once this or the Minimum seed influence value is met. Leaving this field blank means that the round will include all documents returned by the stratified sample, unless you've entered a Minimum seed influence.
      •  Minimum seed influence - the minimum number of documents required to be categorized by each example document returned in a stratified sample. For example, if you leave this at its default value of 25, every document returned in the sample will categorize at least 25 other documents if they are designated as examples. This field is only available for training rounds with a stratified sampling type. The stratified sample is created once this or the Maximum sample size value is met. 
    • Statistical sampling- creates a sample set based on statistical sample calculations, which determines how many documents your reviewers need to code in order to get results that reflect the project universe as precisely as needed. Selecting this option makes the Margin of error field required.
      • Confidence level - the probability that the rate in the sample is a good measure of the rate in the project universe. This is used in the round to calculate the overturn range as well as the sample size, if you use statistical sampling.
      • Margin of error - the predicted difference between the observed rate in the sample and the true rate in the project universe. This is used in the round to calculate the overturn range as well as the sample size, if you use statistical sampling.
    • Percentage - creates a sample set based on a specific percentage of documents from the project universe. Selecting this option makes the Sampling percentage field required.
      • Sampling percentage - the percentage of the eligible sample population used to create the sample size.
    • Fixed sample size - creates a sample set based on a specific number of documents from the project universe. Selecting this option makes the second Fixed sample size field required.
      • Fixed sample size - the number of documents you want to include in your sample size.
  7. Click Calculate sample to display a calculation of the sample based on the documents eligible for sampling and the values specified in the Sampling Methodology section.
    Clicking Calculate sample displays the number of documents in the saved search selected for the round and the number of documents in the sample. If the values for the sample and/or saved search are unexpected, you can change any setting in the Start Round layout and re-calculate before clicking Go. You can't calculate the sample if you don't have access to the saved search selected for the round. This button is disabled if you've selected Stratified sampling as the sampling type.
  8. Sample calculator display

  9. Specify how to batch documents out for review.
    • Automatically create batches - determines whether or not a batch set and batches are automatically created for this round's sample set. By default, this field is set to whatever value was specified in the project settings. Once the sample set has been created, you can view and edit the corresponding batch set in the Batch Sets tab.
    • Maximum batch size - the maximum number of documents that the automatically-created batches will contain. This is required if the Automatically create batches field above is set to Yes. This value must be greater than zero or an error appears when you attempt to save the project. The batch set and batches created from this project are editable after you create the project. By default, this field is set to whatever value was specified in the project settings.
  10. Click Go.
  11. Proceed to Training round document review.

Note: When the round is created, the field specified as the Use as an example field is set to Yes by default for documents included in the round. If you delete a round, Sample-Based Learning reverts the Use as an example field value to Not Set (null).

Training round document review

Sample-Based Learning is trained as documents are reviewed and assigned to categories. See Sample-Based Learning document review for more information on protocol for assigning documents out and reviewing documents during a round.

Note: If you're done using a project, it's better for workspace performance if you finish the round rather than leaving it in a status of either Review in Progress or Review complete.

Finishing a training round

Once all of the documents in the sample set have been coded, you should finish the round. You also have the option of finishing a round before all of the sample set documents have been coded.

To finish a training round:

  1. Click Finish Round on the console.
    Finish round button
  2. Specify whether you want to categorize documents when you finish the round. You have two options depending on your project:

    • Categorize for designation - categorize all documents in the project based on their designation coding.
    • Categorize for issues - categorize all documents in the project based on their issue coding. This is only available if you have added a key issue field to the project and a reviewer has issue-coded at least one document in the sample set.

  3. Specify whether you want to save categorization results from the previous round when you finish the current round. You may have two options depending on your project:
    • Save designation results - save the results of designation coding from the previous categorization. This is useful because when categorization runs, the previous results are cleared in order to apply the new category values. You can't save designation results if you did not categorize designations in a previous round.
    • Save issue results - save the results of issue coding from the previous categorization. This is only available if you have added a key issue field to the project. You can only save issue results if you categorized issues in a previous round.

    Note: You shouldn't save results at the end of every round. Saving results, especially for larger cases, can add several hours to the time it takes to finish the round.

  4. Enter the naming for your categorization results.
    • Categorization results set name - the name of the categorization results set. By default, this is the name of the previous round. This is only available for editing if you are saving designation and/or issue results.
    • Categorization results set description - a description of the categorization results. This is only available for editing if you are saving designation and/or issues results.

    Finish round layout

  1. Click Go. If you choose to both categorize and save results, the saving of results is performed first, then categorization.

Reviewing Sample-Based Learning reports after a training round

The following reports should be reviewed after finishing a training round:

  • Round Summary Report – useful after categorization because it shows the changes in categorization percentage from round to round. See Round Summary report.
  • Control Set Statistics – tracks progress of precision and recall and F1. See Control Set Statistics report
  • Rank Distribution – shows level of conceptual similarity between human-coded documents and the overall categorized documents. See Rank Distribution report
  • Project Summary – tracks overall project health. You can see a snapshot of overturn and categorization results as well as control set statistics in one place. See Project Summary report

Note: If issues are also being categorized by Assisted Review, you can also review the Issue Reports.