

When running name normalization, email header formats in the extracted text can have a lot of variation and are generally less clean than the top-level headers. Because of this, you may want to initially run name normalization on only the top-level headers (To, From, Cc, Bcc) to produce cleaner results. These results can then be used to help seed additional runs of name normalization.
See these related pages:
When you create a structured analytics set, there are two settings that affect whether name normalization will run on the email headers, or on the extracted text. These settings are found in the Email Headers section.
The settings are:
Analyze topmost email only—runs name normalization on the most recent email received in an email chain.
Use email header fields—includes the email metadata fields in the analysis.
Email metadata fields include From, To, CC, BCC, Subject, and Date Sent.
You can map the email metadata fields in the Analytics profile. For more information, see Analytics profiles.
In order to see these settings, select Name Normalization as one of the operations to run.
Depending on how each setting is toggled, name normalization analyzes documents differently.
For most name normalization sets, we recommend leaving both settings On. However, different setups work better for different data sets.
Setting Combination | Effect |
---|---|
Setup 1:
|
Name normalization runs first on the email metadata fields associated with each document. If a document does not have a From field and at least one other email header field, name normalization analyzes the extracted text from the top segment of the email chain instead. |
Setup 2:
|
Name normalization ignores the email metadata fields associated with each document. Instead, it analyzes the extracted text from the top segment of the email chain. |
Setup 3:
|
Name normalization analyzes both the email metadata fields and the complete extracted text of each document. |
Setup 4:
|
Name normalization only analyzes the complete extracted text. It does not analyze the email metadata fields. |
Regardless of settings, if name normalization cannot find a From header and at least one other email header (such as To, CC, BCC, Subject, or Date Sent) in whatever it analyzes, it skips analyzing the document. These documents are assumed to be non-emails.
The content of the Participant field is affected by these settings as follows:
If name normalization analyzes only the email metadata fields, then the Participant field populates with the Entities found in the metadata fields.
If name normalization analyzes the complete extracted text, then the Participant field populates with the Entities found in the entire extracted text.
For more information on the Participant field, see Name normalization results.
Usually, if name normalization cannot find a From field and one other email header field (such as To, CC, BCC, Subject, or Date Sent), it analyzes the extracted text to find it. If you want to force name normalization to use only email metadata fields, you can use regular expressions (regex) to filter out all extracted text.
This workflow assumes you have the following:
To force name normalization to run on email metadata fields only:
Note: You must set the configuration to the seven characters specified above, exactly as it appears, with no extra spaces.
The regular expression filter is the key to this solution. The filter works as follows:
In other words, this filters out every character of the extracted text, including line breaks, as it is being sent to the Analytics engine.
Note: We highly discourage using this regular expression anywhere else. Only use this regular expression with name normalization. If you apply this regular expression to other operations, such as email threading, the results will be unusable.
After executing this process, you can work with the entities and aliases as-is, or you may later choose to bring the extracted text into consideration. To bring in the extracted text, remove the regular expression filter from the structured analytics set, and then re-run the set with the Repopulate Text setting enabled.
Note: There are other regular expressions such as ^.*$ that can achieve the same result. However, they are more memory intensive. We recommend using the (?s).*+ expression for best performance, especially if your document set includes large documents.
For more information on using regular expressions (regex) in Relativity, see:
Why was this not helpful?
Check one that applies.
Thank you for your feedback.
Want to tell us more?
Great!