

To run a structured analytics set within your workspace, you must first use the Structured Analytics Set console to create a new set and select which operations will be included. After the set has been completed and run, you’ll be able to view summary reports for each of the operation types you chose.
Before you begin working with structured analytics sets, make sure that your user group has the following permissions:
Object Security | Tab Visibility | Other Settings |
---|---|---|
|
|
|
For more information about setting permissions, see Workspace security.
To create a new structured analytics set:
The console appears, and you can now run your structured analytics set. See Structured Analytics Set console.
Note: When creating a new structured analytics set in a large workspace, the document table may become locked while the results fields are being created. We recommend creating new sets off-hours to prevent any disruption to review.
The Structured Analytics Set layout contains the following fields:
Note: You can change the structured analytics set operations after you’ve run a set. Once you successfully run an operation and want to run another, return to your set and deselect the operation you previously ran and select the new operation. Then, save and run your structured analytics set.
For best results, avoid nested saved search conditions and exclude relational fields. Use a field tag where possible, and do not apply a sort order. The fields returned in the search do not matter.
Note: You can access documents that are automatically removed from the set in the Field Tree. Each completed Structured Analytics set contains an ‘Included’ and ‘Excluded’ tag within the Field Tree. You can find documents excluded from the set under the ‘Excluded’ tag for that set. For more information, see Running structured data analytics.
This field supports linking one repeated content filter. The filter type must be Regular Expression.
The filter only applies to the field being analyzed.
We recommend not using this field when running operations for the first time.
Select field to analyze - the field being analyzed during the structured analytics operations. For most users, we recommend leaving this on the default value of Extracted Text. However, if you have a custom workflow that puts an extracted text equivalent into another field, choose that field here. The chosen field must be either a long text field or a fixed-length text field, and it must contain text in order for a document to be analyzed.
If you change the value for this setting on an existing structured analytics set, it does not affect the results until you re-run structured analytics.
Running the set with Update Only New Documents enabled will analyze the new field for newly added documents, but not old ones. Old analysis results from the previous field will remain.
Running the set with Update Only New Documents disabled will analyze the new field for all documents. Old analysis results from the previous field will be overwritten.
Enable additional domain filtering – populates additional fields with extracted email domains during name normalization. These fields have enhanced filtering options for sorting and searching. For a list of field names and more information, see Using enhanced domain filtering.
Select Yes to use fields with enhanced filtering options for email domains. Select No to use fields with simple text filtering for email domains.
See the following considerations for each operation.
To run the email threading operation, you must select values for the following fields:
Note: You can access documents that are automatically removed from the set in the Field Tree. Each completed Structured Analytics set contains an ‘Included’ and ‘Excluded’ tag within the Field Tree. You can find documents excluded from the set under the ‘Excluded’ tag for that set. For more information, see Running structured data analytics.
Note: Refer to the Analytics Email Threading - Handling Excluded Large Attachments knowledgebase article for more information on handling any excluded large attachments.
If you select Yes, you must ensure all email fields are properly mapped on the Analytics profile.
Select No if your document set doesn't include email metadata. When set to No, email threading relies on extracted text, and the Parent Document ID and Attachment Name fields.
Note: Selecting extra languages may impact performance. Only select if you know you have non-English headers to analyze.
Note: We recommend creating a new relational fixed length text field for every set to take advantage of grouping functionality for documents in a list view. The length of this field must be greater than or equal to 10.
Note: We recommend creating a new relational fixed length text field for every set to take advantage of grouping functionality for documents in a list view.
If your email threading results appear to have errors, applying a regular expression filter to remove text such as dates or URLs can improve results. See Creating a repeated content filter for steps to create a regular expression type filter.
To run the name normalization operation, you must select values for the following fields:
Note: You can access documents that are automatically removed from the set in the Field Tree. Each completed Structured Analytics set contains an ‘Included’ and ‘Excluded’ tag within the Field Tree. You can find documents excluded from the set under the ‘Excluded’ tag for that set. For more information, see Running structured data analytics.
Note: Attachments are not included in name normalization. Aliases on an email that is an attachment are not parsed and added to the Alias table.
Select No if your document set doesn't include email metadata. When set to No, name normalization relies on extracted text, the Parent Document ID field, and the Attachment Name field.
Note: Selecting extra languages may impact performance. Only select if you know you have non-English headers to analyze.
Enable additional domain filtering – populates additional fields with extracted email domains during name normalization. These fields have enhanced filtering options for sorting and searching. For a list of field names and more information, see Using enhanced domain filtering.
Select Yes to use fields with enhanced filtering options for email domains. Select No to use fields with simple text filtering for email domains.
To run the textual near duplicate identification operation, you must select values for the following fields:
Note: We recommend creating a new relational Fixed Length Text field for every set to take advantage of grouping functionality for documents in a list view.
See the following table for examples:
Example | Ignored when Ignore numbers is set to Yes? |
---|---|
123 | Yes |
123Number | Yes |
number123 | No |
n123 | No |
.123 | Yes |
$12 | Yes |
#12 | Yes |
$%123 | No |
Note: Setting the value of this field to "No" causes the structured analytics set to take much longer to run. Also, the "Numbers Only" Textual Near Duplicate Group will not be created because these documents will be considered.
The language identification operation does not use any additional settings.
The repeated content operation includes settings that allow you to adjust the granularity of analysis.
Select values for the following fields:
We recommend using the default values for these settings the first time you run the repeated content operation on a set of documents. If necessary, you can adjust these settings based on your initial results. For example, you may increase the minimum number of occurrences if you receive too many results or increase the maximum number of words if you’re only identifying partial repeated phrases.
If you’re still not satisfied with the results, advanced users may want to adjust these settings:
Note: Each setting has an upper bound value. You can't save a structured analytics set with a value that exceeds a setting's upper bound value. This prevents you from using settings that may crash the server.
Build or update a structured analytics set with the available run commands on the Structured Analytics Set console. After saving a new structured analytics set, the console automatically loads. To access the console for another structured analytics set, click the set name listed on the Structured Analytics Set tab.
The Structured Analytics Set console contains the following options:
The Run button starts the operations you have chosen for the structured analytics set.
When you click Run, a modal opens with the following options:
Select this option if document text in your data set has changed and needs to be updated within the Analytics engine; regular expression filters need to be applied, removed or updated; or if any fields on the Analytics profile have changed.
To run a full analysis on your set without having to resubmit all of your documents to the Analytics engine, disable both Update Only New Documents and Repopulate Text.
Click Run to start the build operation.
Note: If a previously run operation remains selected on subsequent runs, that operation is skipped if no new documents have been added to the saved search and no changes were made to that operation's settings. To force a re-run of an operation in a scenario like this, enable the Repopulate Text option.
After the build is running, you can click Cancel Operation to cancel the run. This stops the analysis process and puts the structured analytics set in a state that allows you to re-run the analysis.
After you have clicked Cancel Operation, you must wait for the cancellation process to complete before you can take any actions on the structured analytics set.
The Retry Errors button appears when the set has encountered one or more errors. Clicking the button will make the system try again to analyze any errored documents.
For more information on errors, see Error Handling.
When you enable Update Only New Documents, it affects each structured analytics operation as follows:
If the newly added documents match with existing groups, the documents are incorporated into existing Email Thread Groups.
This analyzes newly added documents for new aliases. Aliases that exist in Relativity are never deleted, renamed, or adjusted in any way on subsequent runs.
If the newly added documents match with existing textual near duplicate groups, the new documents are incorporated into those groups. You may encounter the following scenarios:
Scenario 1: A newly added document matches with preexisting textual near duplicate group, and the newly added document is larger than or equal to all of the documents currently in the textual near duplicate group.
Result: The preexisting Principal will never change. The newly added document will not be added to a preexisting group. It will become a singleton or "orphan" document.
Scenario 2: A newly added document matches with preexisting document that was not in a textual near duplicate group. The preexisting document is larger than the new document.
Result: The preexisting document is marked Principal. It is updated to have a textual near duplicate group, along with the newly added document.
Scenario 3: A newly added document matches with preexisting document that was not in a textual near duplicate group. The new document is larger than the preexisting document.
Result: The newly added document is marked Principal. It is updated to have a textual near duplicate group. The preexisting document is not updated at all and is essentially orphaned.
Note: This is a current limitation in the software and is not an ideal situation. If this occurs, you will see a newly added document marked Principal in a group all by itself. You can check for this scenario by running a Mass Tally on the Textual Near Duplicate Group field. A group of one document should not normally exist – if it does, then this situation has occurred.
This incorporates newly added documents and re-analyzes all documents in the set to identify their languages.
This incorporates newly added documents and compares all documents in the same way as a full analysis, which could result in duplicate repeated content filters being created. This is because repeated content identification analyzes a collection of documents rather than single documents.
A structured analytics set may have any combination of operations selected. We recommend running email threading and near duplicate identification in the same set. Note the following:
We generally recommend that you run name normalization in its own structured analytics set for maximum flexibility. While it is faster to run multiple structured analytics operations together in one set, you may find that you are ultimately constrained if you want to make modifications to the document set or the settings.
The following links to reports are available:
Note: Click the Toggle Conditions On/Off button followed by the Clear button to remove the search condition from the Repeated Content Filters tab.
The Error Handling section of the console appears when the set has encountered one or more errors. It contains the following options:
Click the drop-down sections below to display the following information about common errors:
Operations | Document status | Description | Next steps |
---|---|---|---|
|
Data warning | The operation encountered errors during text extraction due to special characters, such as emojis, that the Analytics engine can't process. You can find more information in this article on the Community site. | Review the document for any special characters contained in the extracted text and review any results from the selected operation(s). |
Operations | Document status | Description | Next steps |
---|---|---|---|
|
Removed from set | Errors were encountered during text extraction due to no text, encrypted text or corrupted text | Review the extracted text of the document to verify it contains text and that the text is not corrupted or encrypted. |
Operations | Document status | Description | Next steps |
---|---|---|---|
|
Data warning | The metadata for the item exceeded 500 characters. | None for email threading and name normalization. You can still run this document through other structured analytics operations, like language identification and textual near duplicate identification. |
Operations | Document status | Description | Next steps |
---|---|---|---|
|
Data warning | Invalid Unicode characters were included in the metadata for the document. | None for email threading and name normalization. You can still run this document through other structured analytics operations, like language identification and textual near duplicate identification. |
Operations | Document status | Description | Next steps |
---|---|---|---|
|
Data warning | Invalid Unicode characters were found in the text of the document and replaced. | Review the document for any special characters contained in the extracted text and review any results from the selected operation(s). |
Operations | Document status | Description | Next steps |
---|---|---|---|
|
Data warning | Too many email segments preventing the Analytics engine from processing the item. The maximum number of segments an email can contain is 2000. | None for email threading and name normalization. You can still run this document through other structured analytics operations, like language identification and textual near duplicate identification. |
Operations | Document status | Description | Next steps |
---|---|---|---|
|
Data warning | The email parsing process takes too long resulting in a timeout. | For next steps, refer to the this article on the Community site. |
Operations | Document status | Description | Next steps |
---|---|---|---|
|
Removed from set | The document text for the requested document(s) could not be accessed or found in the DataGrid file share. | Review the extracted text to verify that it exists and is accessible by Relativity. |
Operations | Document status | Description | Next steps |
---|---|---|---|
|
Data warning | The number of characters in the recipient fields - To, CC, BCC - for a segment exceeds the maximum of 50,000. | Review the extracted text to identify the problematic segment and reach out to support to temporarily increase the character limit. |
If an error occurs during a delete, it creates an error in the Errors tab. When an error occurs, you must manually clean up the Analytics server and the population tables on your server.
Note: When you delete a structured analytics set, the field values for Language Identification (Docs_Languages:Language, Docs_Languages, Docs_Languages::Percentage), and the Repeated Content Filter objects and fields (Name, Type, Configuration, Number of Occurrences, Word Count, Ready to Index), remain populated. There is no need to clear the fields, because future runs will overwrite their values. If you want to clear them manually, contact Relativity Support.
Structured analytics sets may have the following statuses before, during, and after running operations:
Structured analytics set status | Appears when |
---|---|
Please run full analysis | The structured analytics set has been created, but no operations have run with it. |
Setting up analysis | The structured analytics job is initializing. |
Syncing document set | Update Only New Documents has been set to No or Repopulate Text has been set to Yes. |
Calculating file sizes | File sizes are being calculated for all documents in the saved search. |
Exporting documents | Documents are being exported from Relativity to Analytics engine for analysis. |
Completed exporting documents | Documents have been exported from Relativity to Analytics engine for analysis. |
Running structured analytics operations | Analytics engine has started running the structured analytics operations. |
Importing results into Relativity | Structured analytics results are being imported into Relativity from Analytics engine. |
Importing entities and aliases into Relativity | Name Normalization results are being imported into Relativity from Analytics engine. |
Completed structured analytics operations | Structured analytics results have been imported into Relativity from Analytics engine. |
Error while running analysis | Structured analytics job failed with errors reported. |
Attempting to retry errors | An error retry is in progress. |
Canceling analysis | The Cancel Operation button was just clicked. |
Canceled analysis | The cancel action has completed. |
Copying results to legacy document fields | Copy to Legacy Fields process is running. |
When you first run your structured analytics set, the Structured Analytics Sets multiple choice field is created on the Document object and populated for the documents in the set with the name of the structured analytics set and whether the document was included or excluded from the named set. This field is populated every time the set is run. You can use this field as a condition in a saved search to return only documents included in the set. You can also view the documents which were excluded from the set. These could be empty documents, number only documents, or documents greater than 30 MB.
The Structured Analytics Set field also displays in the Field Tree browser to make it easy to view the documents that were included and excluded from the set. You can also view documents that are not included in a structured analytics set by clicking [Not Set].
After running an analysis, you can review the results for each selected operation. For guidelines on assessing the validity of the results and making sense of the analysis, see the following
Upon upgrade to Relativity 9.5.196.102 and above, email threading and textual near duplicate results are written to new results fields that are only created upon saving a Structured Analytics Set. The Copy to Legacy Fields button gives you the option of copying the contents of the newly created fields back to the existing document fields. This ensures that anything referencing the legacy fielded data, such as views and saved searches, continues to work with the new results.
Please note:
The Copy to Legacy Document Fields button is only available on the Structured Analytics Set console if the following conditions are met:
Note: This button may show up on multiple Structured Analytics sets. However, if you run the operation on multiple sets, you will overwrite the field information.
To run, click the Copy to Legacy Fields button. The progress is displayed in the status section. You can cancel the operation while it is running, but you cannot roll back the results, and the job will be left incomplete.
Upon completion, the audit tells you the total number of fields updated. If the operation fails, you can retry the operation.
Note: Name normalization results are never purged. In order to completely re-run name normalization results, you must remove all previously identified entities and aliases from the workspace. For more information, see Deleting all data to re-run.
On this page
Why was this not helpful?
Check one that applies.
Thank you for your feedback.
Want to tell us more?
Great!