Structured Analytics

Structured analytics operations analyze text to identify the similarities and differences between the documents in a set.

Using structured analytics, you can quickly assess and organize a large, unfamiliar set of documents. On the Structured Analytics Set tab, you can run structured data operations to shorten your review time, improve coding consistency, optimize batch set creation, and improve your Analytics indexes.

This page contains the following information:

See these related pages:

Upgrade considerations

Beginning in Relativity, you can easily store the results for multiple structured analytics sets and set up views that capture the email threading or repeated content identification results of those operations. Make sure you review the following upgrade considerations around multiple structured analytics sets before upgrade.

Note: For a complete list of Analytics upgrade considerations, see Upgrade considerations for Relativity 9.6.

  • We recommend you set new relational fields (e.g., Destination Email Thread Group, Destination Email Duplicate ID, and Destination Textual Near Duplicate Group) when creating new structured analytics sets for email threading or textual near duplicate identification to allow you to easily set up views that make use of a relational field for each of these sets.
  • Upon upgrade, email threading and textual near duplicate results are written to new results fields that is only created upon saving a Structured Analytics Set. These fields can't be manually created before running the set. This means that it's not possible to create any views, searches, layouts, etc. that reference these fields prior to saving a set.

Structured analytics vs. conceptual analytics

It may be helpful to note the following differences between structured analytics and conceptual analytics, as one method may be better suited for your present needs than the other.

Structured analytics Conceptual analytics
Takes word order into consideration Leverages Latent Semantic Indexing (LSI), a mathematical approach to indexing documents
Doesn’t require an index (requires a set) Requires an Analytics Index
Enables the grouping of documents that are not necessarily conceptually similar, but that have similar content Uses co-occurrences of words and semantic relationships between concepts
Takes into account the placement of words and looks to see if new changes or words were added to a document Doesn't use word order

Structured analytics operations

Structured analytics includes the following distinct operations:

  • Email threading performs the following tasks:
    • Determines the relationship between email messages by grouping related email items together.
    • Identifies inclusive emails (which contain the most complete prior message content) and can bypass redundant content.
    • Applies email visualization (including reply, forward, reply all, and file type icons). Visualization helps you track the progression of an email chain—allowing you to easily identify the beginning and end of an email chain.

      Note: The results of email threading decrease in accuracy if email messages contain non-English headers.

  • Name normalization performs the following tasks:
    • Identifies aliases (proper names, email addresses, etc.) within email headers.
    • Groups aliases into entities (people, distribution groups, etc.).
  • Textual near duplicate identification performs the following tasks:
    • Identifies records that are textual near-duplicates (those in which most of the text appears in other records in the group and in the same order).
    • Returns a percentage value indicating the level of similarity between documents.
  • Language identification performs the following tasks:
    • Identifies the primary and secondary languages (if any) present in each record.
    • Provides the percentage of the message text that appears in each detected language.

    See the Supported languages matrixfor a complete list of languages that the language identification operation can detect.

  • Repeated content identification analyzes extracted text to identify repeated content at the bottom of documents, such as email footers, that satisfy the minimum repeated words and minimum document occurrences settings. It returns a repeated content filter, which you can apply to an Analytics profile to improve Analytics search results.
  • Note: The repeated content filter can be applied to the Analytics index . Repeated content filters are no longer linked to the Analytics profile.

The following table summarizes the primary benefits of each operation.

Operation Optimizes batch set creation Improves coding consistency Optimizes quality of Analytics indexes Speeds up review
Email threading  
Name normalization  
Textual near duplicate identification  
Language identification    
Repeated content identification      

Note: Starting in 9.5.370.136, you can change the structured analytics set operations after you’ve run a set. Once you successfully run an operation and want to run another, return to your set and deselect the operation you previously ran and select the new operation. Then, save and run your structured analytics set.

Setting up your environment

Note: If you are a current RelativityOne user, and you want to install or upgrade this application, you must contact the Client Services team.

To run structured data analytics operations, you must add the Analytics application to the workspace. Installing the application creates the Indexing & Analytics tab, along with several fields that allow structured analytics to run. Due to the addition of several relational fields, we recommend installing the application during a low activity time via the Applications Library admin tab.

Once you've installed the application to at least one workspace, you must also add the Structured Analytics Manager and Structured Analytics Worker agents to your environment.For steps to add agents, see Adding and editing agents. Additionally, the workspace's resource pool must have at least one Analytics server with the Analytics operation Structured Data Analytics enabled. See Servers for more information.