

There are two types of email messages in Structured Analytics:
By reviewing only inclusive emails and skipping duplicates, your review process will be much more efficient. The Analytics engine derives the email threads and determines which subset of each conversation constitutes the minimal inclusive set. Non-inclusive emails are redundant because all non-inclusive content is also contained in inclusive emails. The inclusiveness analysis ensures that even small changes in content will not be missed by reviewers.
When Analytics checks emails for inclusivity, it treats email signatures in a similar way to attachments. If an email has a unique signature, that email will be marked inclusive. However, if that signature is attached to a second or third email, it will not be enough to mark those emails inclusive, even if that exact combination of body text and signature does not exist in other emails. This prevents emails from being erroneously marked as inclusive because of signature blocks that were automatically inserted or reordered by an email carrier.
When an email is inclusive because of a unique signature block, the Email Threading Summary report will count it as being inclusive because of Message.
Signature block extraction was added in the July 14, 2022 update. Older structured analytics sets that ran email threading analysis before that date do not extract signature blocks from the email body. Instead, they treat them as part of the body text.
For more information on updating structured analytics sets, see Run structured analytics.
The Analytics engine identifies signatures according to the Usenet Signature Convention (RFC 3676), and it uses "-- " as the dividing line between the body and the signature of a message. Multiple signatures may be extracted from a single text segment.
When comparing signatures for uniqueness, Analytics uses the MD5 hash of the signature. Some variation is allowed for white space and punctuation.
Factors that affect uniqueness | Factors that do not affect uniqueness |
---|---|
|
|
For these example scenarios, assume that documents 1 and 2 have identical body text and that these signatures do not occur elsewhere in the document set.
If the only differences are in their signatures, they would be marked as follows:
Signatures in Document 1 | Signatures in Document 2 | Inclusivity results |
---|---|---|
|
|
Document 1: Inclusive Document 2: Not inclusive |
|
|
Document 1: Inclusive Document 2: Inclusive |
|
|
Document 1: Inclusive Document 2: Duplicate spare of Document 1 |
Why was this not helpful?
Check one that applies.
Thank you for your feedback.
Want to tell us more?
Great!