Entity normalization audit

Entity normalization is a highly complex algorithm that involves multi-stage processing logic. The Entity Normalization Audit tool provides transparency into the consolidation logic by tracking a record’s journey through the stages of normalization to explain how final entities are produced and troubleshoot problems.

To view the Entity Normalization Audit tool:

  1. Click to expand the Quality Control icon.
  2. Select the Entity Normalization Audit icon.

Definitions

Following are definitions of terms used throughout this documentation:

Entity—a unique individual identified by their name and their PI, such as SSN, date of birth, and full address. Entities are assigned a unique entity ID, appear in the Entity Centric Report, and are composed of one or more records.

Record—a pairing of raw name and PI data from documents. Records are evaluated against each other to determine if they should be consolidated into an entity. A single record can be transformed into an entity if no related records are found.

Entity Cluster—a group of entities based on PI conflicts and name similarity. Entities with conflicting PI or similar names are grouped together to aid in review and potential merging. Entities are not merged if there is no PI match, or a conflict exists.

Avenue—an avenue represents groupings of records that have been merged. An avenue is a partially merged entity which, as each stage of normalizer runs, may come to incorporate more records. Each avenue contains at least one embryo. At the end of the final normalization step, each avenue is converted into an entity.

Entity normalization audit table

The Entity normalization audit table contains the following information:

Name

Description
Record ID The ID number of the record.
Linked PI Type The PI type of a linked piece of PI.
Linked PI Value The PI value of a linked piece of PI.
Document ID The document IDs of the documents that the record appears on.
Token Hash The token hash ID of the name cluster generated during the initial Name Clustering step of normalization. Records in the same name cluster will share the same token hash ID.
Similar Name Cluster The ID of the similar name cluster that the record belongs to. This value is generated during the Nickname Matching step. All records must belong to some cluster. Records with similar names will be clustered together if their name similarity scores are above the 0.95 threshold.
Initial Merge The avenue ID associated with a record. Records within the same avenue will share the same avenue ID. The avenue ID demonstrates how records were merged at this stage.
Hierarchical Merge The avenue ID assigned to a record during Hierarchical Merging. These avenue IDs are different than the ones generated during initial merging. Records within the same avenue will share the same avenue ID.
Entity ID The ID number of the entity that the record belongs to.
Normalized Cluster The ID number of the conflict cluster that the record belongs to.
View PI A toggle that when clicked, shows the PI linked to the record.

 

Querying records

By default, the Entity Normalization Audit table will be empty. To populate the table with records, run a search using the search bar located at the top of the table.

If no record is found in the database, an error message will appear at the top of the page.

Use cases

You can use the Entity Normalization Audit tool to query information and answer questions that may arise during the normalization process.

Determining which entity a record belongs to

You can find the entity that a record belongs to by querying the record and using the Entity ID column. Click the Entity ID value to open the associated entity.

Troubleshooting

If an entity or PI values associated with it are not appearing on the entity report, use the following workflow to troubleshoot.

Determining why an entity is not showing in the notification report

If an individual has been detected and linked to PI but is not appearing on the entity report, you can determine the cause by following the flow of the record throughout the normalization process:

  1. Query the record using the search bar.
  2. Examine the column values from left to right, starting with Token Hash column.
The Token Hash column is blank

If a record does not have a Token Hash ID assigned, this could mean an issue with the annotation or a link generation issue during Name Clustering. The product only considers individuals as entities and filters business names out. Verify that the name of the record is not an organization to rule out this as the cause.

Similar Name Cluster, Initial Merge, or Hierarchical Merge columns are blank

Similarly, missing values for Similar Name Cluster, Initial Merge, or Hierarchical Merge indicate that the record did not make it through name similarity clustering, initial merging, or hierarchical merging.

Determining why PI values are not showing in the notification report

If a PI value was linked to an individual but is not appearing on the entity report, you can determine the cause by following the flow of the linked record throughout the normalization process:

  1. Query the record that the PI value is linked to using the search bar.
  2. Select the View PI toggle
  3. Examine the column values for the PI value from left to right, starting with Token Hash column.
  4. An example of this could be an SSN in an Excel for Jane Doe is not in the entity report.
The Initial Merge column is blank

A missing value for the Initial Merge column indicates the PI value linked to the record did not make it into the avenue at this stage of normalization.