

Using categorization, you can create a set of example documents that Analytics uses as the basis for identifying and grouping other conceptually similar documents. Categorization is useful early in a review project when you understand key concepts of a case and can identify documents that are representative examples of these concepts. As you review documents in the Relativity viewer, you can designate examples and add them to various categories. You can then use these examples to apply categories to the rest of the documents in your workspace.
Unlike clustering, categorization can be used to place documents into multiple categories if a document is a conceptual match with more than one category. Many documents deal with more than one concept or subject, so forcing a document to be classified according to its predominant topic may obscure other important conceptual content within it. When running categorization, you can designate how many categories a single document can belong to (maximum of five). If a document is placed into multiple categories, it is assigned a unique rank for each.
When documents are categorized, Analytics maps the examples submitted to the concept space, as if they were a document query, and pulls in any documents that fall within the set threshold. However, when you have multiple examples, the categorized documents consist of the combined hits on all of those queries. These results return with a rank, representing how conceptually similar the document is to the category.
Categorization is most effective for classifying documents under the following conditions:
Using Analytics categorization sets
You're a system admin at a law firm and one of your clients, a construction company, just became involved in litigation regarding the use of materials that they weren’t informed were potentially environmentally damaging when they purchased them from a major supplier.
The case started with over 10 million documents. Using keywords, you get the document set down to around 3 million files. You decide that you have a thorough enough understanding of the key concepts involved that you can provide Relativity Analytics with a set of example documents that it can use to identify and group other conceptually similar files.
To begin, you will create a categorization set so that you can get files into categories and assign them conceptual rank.
You call your categorization set "Hazardous Materials", because the goal of the set is to group files based on the four building materials most prevalent to the case. You've already created a saved search that includes all the documents you were left with after applying keywords to the original data set. You select this saved search for the Documents To Be Categorized field. You've also created an Analytics index specifically for this set, and you select this for the Analytics Index field.
Additionally, you need to specify categories and example documents against which you'll run the set. While researching and applying keywords to the data set, you identified four commonly-referred to substances that might be present in the building materials your client purchased. You want to make these into categories, under which you want Analytics to place all the files it deems are relevant to that substance. You create a Categories field that has the following choices:
To create examples, you identify at least one document that mentions each substance, and you use the Categories field you created to mark each document for the appropriate substance. Finally, you select the Categories field as the Categories and Examples Source for the categorization set. You leave all the other fields at their default values and save the set.
Now you're ready to Categorize All Documents through the console. When the categorization finishes, you can view your results in the field tree.
Each example document conceptually defines a category, so you need to know what your categories are before you can find the most appropriate example documents. Keep in mind that a category doesn't have to be focused around a single concept. For example, a category might deal with fraud, but different example documents for the category might reflect different aspects of fraud, such as fraudulent marketing claims, fraudulent accounting, and fraudulent corporate communications.
Example documents define the concepts that characterize a category, so properly defining example documents is one of the most important steps in categorization. In general, example documents should be:
Note: You must have an Analytics conceptual index set up before you can create a categorization set.
To create a categorization set:
The following fields are included on the Analytics Categorization Set Layout.
Note: When Auto-Synchronize on Categorize All is set to yes, all existing categories are cleared and the new ones specified for the Categories and Example Source field are automatically created when you click Categorize All on the console.
The following information is displayed in the Job Information section of the Analytics Categorization Set Layout:
If you don't populate the Categories and Examples Source field on the set, and you haven't linked any categories or example objects to the set, no buttons on the console are enabled. Console buttons only become enabled after you add at least one category and one example object to the set. See Adding new categories and examples through the layout.
If you choose not to make a selection for the Categories and Examples Source field on the categorization set, you can manually add new categories and assign example documents to a set using the Analytics Categorization Set layout. There are no limits to the number of categories you can add to a categorization set.
To add a new category from the layout, perform the following steps:
You can add an entire document or a chunk of text as an example. To add a new example from the layout, perform the following steps:
Note: If both the Document and Text fields in the example are populated, Text will override Document. Therefore, if you intend to select a document from the ellipsis to use in your category, do not supplement it with information in the Text field because only the text is considered.
Note: We recommend at least 5-20 examples per category to provide good coverage of the topic. It's not unusual in a workspace of several million documents to need a couple of thousand examples.
Furthermore, we strongly recommend you limit the number of examples you have per category to 15,000 documents. There is no system limitation to how many examples you can have, but the more examples you have, the longer it will take the system to run categorization.
If you haven't manually created any categories or examples, but you have populated the Categories and Examples Source field on the categorization set, the Create Categories and Examples button is enabled on the console. You can use this button to automatically add new categories and examples to your categorization set.
Note: When you click Create Categories and Examples, Relativity clears all existing categories and examples and generates new ones. Categories are created for each choice in the Categories and Examples source field. If an Example Indicator Field is selected on the categorization set, examples are created for every document with a designation of Yes for the Example Indicator Field. The category is assigned to the example document based upon the value of Categories and Examples source field. If an Example Indicator Field is not selected on the categorization set, examples are created for every document with a value in the Categories and Examples source field. The category is assigned to the example document based upon the choice selected in the Categories and Examples source field.
During creation, the Create Categories and Examples button changes to Stop Creation, which you can click to stop the process.
Once category and example creation is complete, the Analytics Category and Analytics Example associative object lists reflect the results.
When you have assigned categories and examples to your categorization set, the Categorize All Documents button becomes enabled on the Categorization Set console.
Clicking this button kicks off a categorization job based on the settings specified when you created the set. When you run a new categorization job, all results of the previous categorization job are deleted.
Note: If the Auto-Synchronize on Categorize All field under Categorization Setup is set to Yes, all existing categories and examples will be cleared and the ones specified for the Categories and Example Source field will automatically be created when you click Categorize All on the console.
To begin categorizing, click Categorize All Documents. When the confirmation message appears, asking you if you want to run categorization, click OK.
Note: We recommend running only two categorization sets at once for optimal performance.
Once the categorization has been kicked off, the following options are enabled in the Categorization Set console:
After the initial categorization process is complete, or after you have clicked Stop Categorization, the following button is enabled:
When you run a categorization set, the system creates the Categories - <name of categorization set> and Category Rank fields. Use Categories - <name of categorization set> to view the search results. Use Category Rank to see how closely related documents are to the category.
Note: The Pivot On and Group By fields are set to Yes by default for all Categories - <name of categorization set> and Category Rank fields. For Categories - <name of categorization set>, you can change the Pivot On and Group By to No; however, you can't change the Category Rank fields to No. When you run a categorization set, all previously created Pivot On and Group By fields for Category Rank change to Yes.
After a categorization job is completed, you can view the results in the field tree. All category set names are appended with the word "Categories" in the format Categories - <name of categorization set>. Click + to display a list of categories in the set.
Note: Documents that appear in the [Not Set] tag in the field tree were either not close enough to an example to get categorized, not in the data source of the conceptual index, or not submitted for categorization.
The fields created by your categorization set are available as conditions when you create a saved search. You can search on them and review the results.
To create a saved search to see your categorization results, perform the following steps:
Why was this not helpful?
Check one that applies.
Thank you for your feedback.
Want to tell us more?
Great!