Name normalization

Name Normalization analyzes email document headers to identify all aliases (proper names, email addresses, etc.) and the entities (person, distribution group, etc.) those aliases belong to.

Watch the following Running Name Normalization video.

This page contains the following information:

See these related pages:

Special considerations

Before running the name normalization operation, note the following:

  • If Processing or Legal Hold are installed in your workspace with Analytics, we strongly recommend that you add a Classification value to your existing entities so that you can differentiate between them and the entities created by the name normalization operation. To do this, complete the following:
    1. Create a choice called Custodian on the Classification field on the Entity object.
    2. Select all of your existing Entities, and perform a Mass Edit to add the Custodian classification value to these objects. You should also add the Custodian value to any future Entities created or imported for Processing purposes.
    3. Once completed, you can search or filter on the Classification field to observe specific entities.
  • If you want name normalization to write data to an existing entity, you must add alias values to the entity prior to running name normalization. By adding alias values, you increase the probability that new data is mapped to existing entities rather than duplicated on a new entity. You can add aliases by importing through the RDC or by manually creating aliases and assigning them to entities via the Assign to Entity mass operation.
    We recommend manually adding aliases like email addresses, unique variations of the entity's name (eg. John Doe; Doe, John), or any other unique identifiers that may be used by this entity.

    Note: If you do not add these values prior to running name normalization, you can still use the Merge mass operation to consolidate duplicate entities. For more information, see Entity object.

Name normalization overview

The name normalization process includes the following steps at a high level:

First, the operation parses header data (From, To, Cc, Bcc) from every segment within an email document using the same logic as email threading. Once the header data is parsed, name normalization identifies aliases within each section, looking for semi-colon delimiters to identify multiple aliases. Each unique alias is stored and matched with an unnamed entity.

Consider the following email segment:

Segment
From: john.doe@example.com
To: jason.smith@example.com; mary.adams@example.com
Cc:
Bcc:
Date: 11/01/2018 10:00AM
Subject: Let's talk about NN

Hey Jason, How's Name Normalization going?
Does your team need any help? Cheers, John

Name normalization identifies the following aliases:

Entity Alias
Entity 1 john.doe@example.com
Entity 2 jason.smith@example.com
Entity 3 mary.adams@example.com

If an alias is one of the formats below, the full alias is stored as well as separate aliases for the description (Doe, John) and the email address (john.doe@example.com). All three aliases are joined to the same entity.

  • "Doe, John" <john.doe@example.com>
  • 'Doe, John' <john.doe@example.com>
  • Doe, John <john.doe@example.com>
  • 'Doe, John' [john.doe@example.com]
  • Doe, John [john.doe@example.com]

For example, if an email segment contains "Doe, John" <john.doe@example.com>, name normalization identifies the following aliases:

Entity Alias
Entity 1
  • "Doe, John" <john.doe@example.com>
  • Doe, John
  • john.doe@example.com

Note: Generic aliases, such as Mom or John, are not created to limit over-merging.

If a newly identified alias matches an existing alias, it isn't created again. However, name normalization uses logic to match alias siblings to the same entity.

For example, imagine after identifying "Doe, John" <john.doe@example.com>, like in the example above, "Doe, John" <jdog99@domain.com> is identified. All of the aliases are linked to the same entity based on the matching "Doe, John" alias:

Note: Name normalization limits the number of aliases assigned to a single entity to prevent over merging.

Entity Alias
Entity 1
  • "Doe, John" <john.doe@example.com>
  • Doe, John
  • john.doe@example.com
  • "Doe, John" <jdog99@domain.com>
  • jdog99@domain.com

To further improve results, name normalization also uses segment matching to infer relationships between different aliases that appear in the email headers. Consider the segments below from two different documents:

Segment 1 (from Document X) Segment 2 (from Document Y)
From: Doe, John
To: jason.smith@example.com
Cc:
Bcc:
Date: 11/01/2018 10:00AM
Subject: Let's talk about NN

Hey Jason, How's Name Normalization going?
Does your team need any help? Cheers, John
From: johnathan.doe@example.com
To: jason.smith@example.com
Cc:
Bcc:
Date: 11/01/2018 10:55AM
Subject: Let's talk about NN

Hey Jason, How's Name Normalization going?
Does your team need any help? Cheers, John

By analyzing the body text and date sent, name normalization identifies these two segments as matching. It then uses different strategies to determine if the aliases match.