Structured analytics

Structured analytics operations analyze text to identify the similarities and differences among the documents in a set.

Use structured analytics to quickly assess and organize a large, unfamiliar set of documents. On the Structured Analytics Set tab, you can run structured data operations to shorten review time, improve coding consistency, optimize batch set creation, and improve Analytics indexes.

See these related pages:

Structured analytics operations

Structured analytics consists of several operations that group documents based on their content, analyze that content, or create tools to more effectively filter content. You can run any or all of these operations on the same set of documents.

The operations are:

Email threading:
- Determines the relationships among email messages by grouping related email items together.
- Identifies inclusive emails, which contain the most complete prior message content, and can bypass redundant content.
- Applies Email thread visualization to visually show replies, forwards, file types, and more. Visualization makes it easier to find the beginning and end of an email chain and track its progression.
Name normalization:
- Identifies aliases within email headers. These include proper names, email addresses, and so on.
- Groups together aliases that refer to the same person, distribution group, and so on. These groups become entities.
Textual near duplicate identification:
- Identifies documents that are textual near duplicates, meaning that most of their text appears in other documents in the group and in the same order.
- Returns a percentage value showing the level of similarity between documents.
Language identification:
- Identifies the primary and secondary languages in each document. See the Supported languages matrix for a complete list of languages it can detect.
- Provides the percentage of the message text that appears in each detected language.
Repeated content identification:
- Analyzes the linked text field to identify repeated content at the bottom of documents, such as email footers and signatures.
- Returns a repeated content filter, which you can apply to an Analytics index to improve Analytics search results.

These operations have several benefits:

Operation	Optimizes batch set creation	Improves coding consistency	Optimizes quality of Analytics indexes	Speeds up review
Email threading	√	√		√
Name normalization	√	√		√
Textual near duplicate identification	√	√		√
Language identification	√			√
Repeated content identification			√	√

Structured analytics versus conceptual analytics

Structured analytics and conceptual analytics are different from each other in several ways. Depending on your needs, one or the other may work better for you.

Structured analytics	Conceptual analytics
Groups documents that have similar content, but may or may not have similar concepts	Groups documents that have similar concepts, even if the words are different
Takes word order into consideration	Does not consider word order
Takes into account the placement of words and looks to see if new changes or words were added to a document	Uses Latent Semantic Indexing (LSI), which focuses more on concepts than on specific wording changes
Uses a structured analytics set, not an index	Uses an Analytics index

Setting up your environment

To use structured analytics within Relativity, you must have the Analytics application installed in your workspace. Installing the application will create an Indexing & Analytics tab, along with several new fields.

Because this adds some relational fields, we recommend installing the application during a low activity time via the Applications Library admin tab. For more information, see Installing applications.

After you have installed the application to at least one workspace, you must also add the Structured Analytics Manager and Structured Analytics Worker agents to your environment. For steps to add agents, see Adding and editing agents. Additionally, the workspace's resource pool must have at least one Analytics server with the Analytics operation Structured Data Analytics enabled. See Servers for more information.

Relativity template workspaces already have the Analytics application installed by default.