Structured analytics
Structured analytics operations analyze text to identify the similarities and differences among the documents in a set.
Use structured analytics to quickly assess and organize a large, unfamiliar set of documents. On the Structured Analytics Set tab, you can run structured data operations to shorten review time, improve coding consistency, optimize batch set creation, and improve Analytics indexes.
See these related pages:
Read a structured analytics scenario
Read a structured analytics scenario
As a system admin tasked with organizing and assessing one of the largest data sets you've worked with for a pending lawsuit against your client, you find a substantial portion of your data set includes emails and email attachments. To save time and accomplish the task of organizing and assessing the large data set for review, you create and run a new structured analytics set using the email threading operation to do the following:
- Organize all emails into conversation threads in order to visualize the email data in your document list.
- Reduce the number of emails requiring review and focus only on relevant documents in email threads by identifying all inclusive emails—emails containing the most complete information in a thread.
After running your structured analytics set with the email threading operation, you first review the summary report to assess your results at a high level, and then you create a new email threading document view for the purpose of viewing and analyzing your email threading results to identify non-duplicate inclusive emails for review.
Structured analytics operations
Structured analytics consists of several operations that group documents based on their content, analyze that content, or create tools to more effectively filter content. You can run any or all of these operations on the same set of documents.
The operations are:
- Email threading:
- Determines the relationships among email messages by grouping related email items together.
- Identifies inclusive emails, which contain the most complete prior message content, and can bypass redundant content.
- Applies Email thread visualization to visually show replies, forwards, file types, and more. Visualization makes it easier to find the beginning and end of an email chain and track its progression.
- Name normalization:
- Identifies aliases within email headers. These include proper names, email addresses, and so on.
- Groups together aliases that refer to the same person, distribution group, and so on. These groups become entities.
- Textual near duplicate identification:
- Identifies documents that are textual near duplicates, meaning that most of their text appears in other documents in the group and in the same order.
- Returns a percentage value showing the level of similarity between documents.
- Language identification:
- Identifies the primary and secondary languages in each document. See the Supported languages matrix for a complete list of languages it can detect.
- Provides the percentage of the message text that appears in each detected language.
- Repeated content identification:
- Analyzes the linked text field to identify repeated content at the bottom of documents, such as email footers and signatures.
- Returns a repeated content filter, which you can apply to an Analytics index to improve Analytics search results.
These operations have several benefits:
Operation |
Optimizes batch set creation |
Improves coding consistency |
Optimizes quality of Analytics indexes |
Speeds up review |
Email threading |
√
|
√
|
|
√
|
Name normalization |
√
|
√
|
|
√
|
Textual near duplicate identification |
√
|
√
|
|
√
|
Language identification |
√
|
|
|
√
|
Repeated content identification |
|
|
√
|
√
|
Structured analytics versus conceptual analytics
Structured analytics and conceptual analytics are different from each other in several ways. Depending on your needs, one or the other may work better for you.
Structured analytics |
Conceptual analytics |
Groups documents that have similar content, but may or may not have similar concepts |
Groups documents that have similar concepts, even if the words are different |
Takes word order into consideration
|
Does not consider word order
|
Takes into account the placement of words and looks to see if new changes or words were added to a document |
Uses Latent Semantic Indexing (LSI), which focuses more on concepts than on specific wording changes |
Uses a structured analytics set, not an index |
Uses an Analytics index
|
Setting up your environment
To use structured analytics within Relativity, you must have the Analytics application installed in your workspace. Installing the application will create an Indexing & Analytics tab, along with several new fields.
Because this adds some relational fields, we recommend installing the application during a low activity time via the Applications Library admin tab. For more information, see Installing applications.
After you have installed the application to at least one workspace, you must also add the Structured Analytics Manager and Structured Analytics Worker agents to your environment. For steps to add agents, see Adding and editing agents. Additionally, the workspace's resource pool must have at least one Analytics server with the Analytics operation Structured Data Analytics enabled. See Servers for more information.
Relativity template workspaces already have the Analytics application installed by default.