Structured analytics operations analyze text to identify the similarities and differences between the documents in a set.
Using structured analytics, you can quickly assess and organize a large, unfamiliar set of documents. On the Structured Analytics Set tab, you can run structured data operations to shorten your review time, improve coding consistency, optimize batch set creation, and improve your Analytics indexes.
This page contains the following information:
See these related pages:
- Running structured analytics
- Email threading
- Email threading results
- Email thread visualization
- Textual near duplicate identification
- Textual near duplicate identification results
- Language identification
- Language identification results
- Evaluating repeated content identification results
- Name normalization
As a system admin tasked with organizing and assessing one of the largest data sets you've worked with for a pending lawsuit against your client, you find a substantial portion of your data set includes emails and email attachments. To save time and accomplish the task of organizing and assessing the large data set for review, you create and run a new structured analytics set using the email threading operation to do the following:
- Organize all emails into conversation threads in order to visualize the email data in your document list.
- Reduce the number of emails requiring review and focus only on relevant documents in email threads by identifying all inclusive emails—emails containing the most complete information in a thread.
After running your structured analytics set with the email threading operation, you first review the summary report to assess your results at a high level, and then you create a new email threading document view for the purpose of viewing and analyzing your email threading results to identify non-duplicate inclusive emails for review.
Beginning in Relativity 184.108.40.206, you can easily store the results for multiple structured analytics sets and set up views that capture the email threading or repeated content identification results of those operations. Make sure you review the following upgrade considerations around multiple structured analytics sets before upgrade.
Note: For a complete list of Analytics upgrade considerations, see Upgrade considerations for Relativity 9.6.
- We recommend you set new relational fields (e.g., Destination Email Thread Group, Destination Email Duplicate ID, and Destination Textual Near Duplicate Group) when creating new structured analytics sets for email threading or textual near duplicate identification to allow you to easily set up views that make use of a relational field for each of these sets.
- Upon upgrade, email threading and textual near duplicate results are written to new results fields that is only created upon saving a Structured Analytics Set. These fields can't be manually created before running the set. This means that it's not possible to create any views, searches, layouts, etc. that reference these fields prior to saving a set.
It may be helpful to note the following differences between structured analytics and conceptual analytics, as one method may be better suited for your present needs than the other.
|Structured analytics||Conceptual analytics|
|Takes word order into consideration||Leverages Latent Semantic Indexing (LSI), a mathematical approach to indexing documents|
|Doesn’t require an index (requires a set)||Requires an Analytics Index|
|Enables the grouping of documents that are not necessarily conceptually similar, but that have similar content||Uses co-occurrences of words and semantic relationships between concepts|
|Takes into account the placement of words and looks to see if new changes or words were added to a document||Doesn't use word order|
Structured analytics includes the following distinct operations:
- Email threading performs the following tasks:
- Determines the relationship between email messages by grouping related email items together.
- Identifies inclusive emails (which contain the most complete prior message content) and can bypass redundant content.
- Applies email visualization (including reply, forward, reply all, and file type icons). Visualization helps you track the progression of an email chain—allowing you to easily identify the beginning and end of an email chain.
Note: The results of email threading decrease in accuracy if email messages contain non-English headers.
- Name normalization performs the following tasks:
- Identifies aliases (proper names, email addresses, etc.) within email headers.
- Groups aliases into entities (people, distribution groups, etc.).
- Textual near duplicate identification performs the following tasks:
- Identifies records that are textual near-duplicates (those in which most of the text appears in other records in the group and in the same order).
- Returns a percentage value indicating the level of similarity between documents.
- Language identification performs the following tasks:
- Identifies the primary and secondary languages (if any) present in each record.
- Provides the percentage of the message text that appears in each detected language.
See the Supported languages matrixfor a complete list of languages that the language identification operation can detect.
- Repeated content identification analyzes extracted text to identify repeated content at the bottom of documents, such as email footers, that satisfy the minimum repeated words and minimum document occurrences settings. It returns a repeated content filter, which you can apply to an Analytics profile to improve Analytics search results.
Note: The repeated content filter can be applied to the Analytics index . Repeated content filters are no longer linked to the Analytics profile.
The following table summarizes the primary benefits of each operation.
|Operation||Optimizes batch set creation||Improves coding consistency||Optimizes quality of Analytics indexes||Speeds up review|
|Textual near duplicate identification||√||√||√|
|Repeated content identification||√|
Note: Starting in 9.5.370.136, you can change the structured analytics set operations after you’ve run a set. Once you successfully run an operation and want to run another, return to your set and deselect the operation you previously ran and select the new operation. Then, save and run your structured analytics set.
Note: If you are a current RelativityOne user, and you want to install or upgrade this application, you must contact the
To run structured data analytics operations, you must add the Analytics application to the workspace. Installing the application creates the Indexing & Analytics tab, along with several fields that allow structured analytics to run. Due to the addition of several relational fields, we recommend installing the application during a low activity time via the Applications Library admin tab.
Once you've installed the application to at least one workspace, you must also add the Structured Analytics Manager and Structured Analytics Worker agents to your environment.