Using EDRM MIH to identify duplicates in emails
An EDRM Message ID Hash (EDRM MIH) is an MD5 hash of the Message ID value of Email Messages generated by following EDRM guidelines. The value of this hash is calculated in Relativity during discovery and is stored in the Email/EDRMMessageIdentificationHash metadata. You can use the EDRM MIH with the Processing Duplicate Workflow script in Relativity to identify potential email duplicates between cross-platform emails, even on emails that were not processed in Relativity.
How the hash is generated:
Process |
Value |
Message-ID header line from email: |
Message-ID: <CALckR-a8UDkRjO4xJyjd_s0GPxQWw@mail.gmail.com> |
Value passed to MIH generator: |
<CALckR-a8UDkRjO4xJyjd_s0GPxQWw@mail.gmail.com> |
Generated MIH: |
1de319c276884bd0c9e2f1621ada26cc |
See DupeID > Duplicate Identification Project Overview for more information on EDRM's duplicate identification project.
How to use EDRM MIH in Relativity
To use the EDRM MIH, you must first create a new field that stores the hash value in Relativity. The new field must be mapped to the Email/EDRMMessageIdentificationHash processing field for emails processed in Relativity. For load files processed on other platforms, the field can be mapped to the metadata from the load file that refers to the EDRM MIH. The processing deduplication script can then be run to determine cross-platform duplicates.
Fields and saved search
The field and saved search names are optional. You can use any naming convention that suits your organization's needs.
- Fields—create two new fields
- Saved Search—create a new saved search
- Name—EDRM Saved Search
- Fields
- Control Number
- EDRM MIH
- EDRM Duplicate Status
- Processing Item Level Script: 03. Update Duplicate Status
- Saved Search—EDRM Saved Search
- Duplicate Status Field—EDRM Duplicate Status
- Duplicate Hash Field—EDRM MIH
Viewing EDRM MIH results
The EDRM Saved Search displays the results of potential duplicates.
- Sort the results by EDRM MIH and by EDRM Duplicate Status to view the report with information on the possible duplicates.
- Filter the EDRM Duplicate Status column for Master and Duplicate to see the list of potential duplicate documents.
How to map the EDRM MIH field from a load file
While importing the load file, map the fields corresponding to the EDRM MIH to a representative field in the workspace.
How to export the EDRM MIH values
While using the export job, be sure to include the EDRM MIH field to allow further usage of EDRM MIH for duplication.
Limitations
In the following scenarios, the EDRM MIH will not be calculated:
- If the file does not have a Message ID value
- If a file is not an email (.eml, .msg) file
- If the file was discovered in Relativity before this functionality was enabled. In this scenario, consider re-discovering the email files.
In the following scenarios, the MIH on its own may not be adequate to perform deduplication:
- Combination of the MIH and the email Date (for example, Sent Date & Time)
- Draft messages without Message IDs
- SPAM and Fraudulent Messages
- System Generated Emails
- Malformed or Corrupted Message IDs
- Messages with Prepended or Appended Headers, Footers and Signatures
- Messages with Message Group and Alias Addressing
- Messages with BCCs
- Messages with Stripped or Corrupted Attachments
- Messages with Time Anomalies
- Items that are Not Email Messages
See DupeID > Duplicate Identification Project Overview for more information on limitations and use cases.