Structured analytics

Structured analytics operations analyze text to identify the similarities and differences among the documents in a set.

Use structured analytics to quickly assess and organize a large, unfamiliar set of documents. On the Structured Analytics Set tab, you can run structured data operations to shorten review time, improve coding consistency, optimize batch set creation, and improve Analytics indexes.

See these related pages:

Structured analytics operations

Structured analytics consists of several operations that group documents based on their content, analyze that content, or create tools to more effectively filter content. You can run any or all of these operations on the same set of documents.

The operations are:

Email threading:
- Determines the relationships among email messages by grouping related email items together.
- Identifies inclusive emails, which contain the most complete prior message content, and can bypass redundant content.
- Applies Email thread visualization to visually show replies, forwards, file types, and more. Visualization makes it easier to find the beginning and end of an email chain and track its progression.
Name normalization:
- Identifies aliases within email headers. These include proper names, email addresses, and so on.
- Groups together aliases that refer to the same person, distribution group, and so on. These groups become entities.
Textual near duplicate identification:
- Identifies documents that are textual near duplicates, meaning that most of their text appears in other documents in the group and in the same order.
- Returns a percentage value showing the level of similarity between documents.
Language identification:
- Identifies the primary and secondary languages in each document. See the Supported languages matrix for a complete list of languages it can detect.
- Provides the percentage of the message text that appears in each detected language.
Repeated content identification:
- Analyzes the linked text field to identify repeated content at the bottom of documents, such as email footers and signatures.
- Returns a repeated content filter, which you can apply to an Analytics index to improve Analytics search results.

These operations have several benefits:

Operation	Optimizes batch set creation	Improves coding consistency	Optimizes quality of Analytics indexes	Speeds up review
Email threading	√	√		√
Name normalization	√	√		√
Textual near duplicate identification	√	√		√
Language identification	√			√
Repeated content identification			√	√

Structured analytics versus conceptual analytics

Structured analytics and conceptual analytics are different from each other in several ways. Depending on your needs, one or the other may work better for you.

Structured analytics	Conceptual analytics
Groups documents that have similar content, but may or may not have similar concepts	Groups documents that have similar concepts, even if the words are different
Takes word order into consideration	Does not consider word order
Takes into account the placement of words and looks to see if new changes or words were added to a document	Uses Latent Semantic Indexing (LSI), which focuses more on concepts than on specific wording changes
Uses a structured analytics set, not an index	Uses an Analytics index

Setting up your environment

If you are a current RelativityOne user, and you want to install or upgrade this application, you must contact the Customer Support team.

To use structured analytics within RelativityOne, you must have the Analytics application installed in your workspace. Installing the application creates an Indexing & Analytics tab, along with several new fields.

Because this adds some relational fields, we recommend installing the application during a low activity time via the Applications Library admin tab. For more information, see Installing applications.

Relativity template workspaces already have the Analytics application installed by default.

Archiving and restoring workspaces with structured analytics sets

Workspaces that use structured analytics sets can be archived and restored using the ARM application. However, legacy archives from older Server versions might not retain data about which documents belong in an incremental run. For more detailed information, see Analytics considerations.