Troubleshooting a bad email thread

Sometimes, an email conversation doesn't get threaded the way you would expect it to. To understand what might be going wrong, you can employ a second Structured Analytics Set.

Recipe Overview

This recipe shows how to use a second, separate structured analytics set to try and diagnose an improperly threaded email conversation. Not only does this prevent interference with your past results, it will also run much more quickly.

Requirements

  • Structured Analytics application
  • Relativity 9.5.196.102 or above

Special Considerations

This recipe will not work for all email threading problems. But it does describe the most optimal method for beginning a troubleshooting process.

Directions

This process starts after you have already run email threading, and you see an email which does not appear to be threaded correctly. Let's call this email "E".

Running a new structured analytics set

  1. Find all of the emails that you think should have been threaded with E. In some cases, this is the same thread group that E is in. In some cases, there may be documents in E's thread group that you do not expect. In other cases, you'll need to hunt down more documents from E's expected thread by filtering on email subject, keyword searching for distinctive phrases in E, etc.
  2. Once you have determined the set of documents which you expect to be threaded in the same group as E, take only this set and either mass edit it or save it as a list.
  3. Create a saved search which returns only these documents. Name it Thread to investigate.
  4. Create two fixed-length text fields called Temp1 and Temp2.
  5. Create a Structured Analytics Set to run the thread on:
    • Structural Analytics Set Information:
      • Structured Analytics Set name: Thread troubleshooting
      • Set prefix: Z1
      • Select document set to analyze: Thread to investigate (the saved search you created above).
      • Select operations: Email threading
    • Email threading:
      • Select profile for field mappings: use the profile you used previously
      • Use email header fields: use the setting you used previously
      • Email header languages: use the setting you used previously
      • Destination Email Thread Group: Temp1
      • Destination Email Duplicate ID: Temp2

    Note: While these fields are usually used for relational behavior, Relativity lets you map them to non-relational, fixed-length text fields, which will appear below the relational fields in the list. This is preferable for our purposes.

    (Click to expand)

  6. Click Save.
  7. Click Run Structured Analytics. A pop up appears.
  8. Click Run. On a new set, it will always populate all documents.

Working with Results

  1. Create a saved search to investigate the results:
    • Name: Thread troubleshooting outputs
    • Conditions: Saved Search> In saved search> Thread to Investigate
    • Fields:
      • Control Number
      • Z1::Email Threading Display
      • Z1::Inclusive Reason
      • Z1::Indentation
      • Z1::Email Thread Group
      • Z1: Email Action
      • Z1::Email Threading ID
      • Z1::Email Duplicate ID
      • Z1::Inclusive Email
      • Z1::Email Duplicate Spare

    (Click to expand)

  2. Create a saved search to review the input fields. Consult your analytics profile for field mappings:
    • Name: Thread troubleshooting inputs
    • Conditions: Saved Search> In saved search> Thread to Investigate
    • Fields:
      • Control number
      • All six mapped email header fields
      • Parent document ID field
      • Attachment name field
      • Conversation ID field

      (Click to expand)

  3. Using these saved searches, confirm whether the results are the same as before with threading not working as expected. In most cases, they will be the same, or similar.

Parameters to change

  1. Assuming you are still seeing bad results, there are some variables that you can investigate. To start, create a new analytics profile which is identical to the one you used previously.
  2. Now you can start trying out different possible parameters. After each change, run the Thread troubleshooting set again. It's safest to select Repopulate Text to ensure that you are forcing Analytics to run the complete test from scratch each time.

Here are some things to try, with the most likely and/or easiest changes listed first:

  • Conversation ID - Unmapping (or mapping) this setting can dramatically affect results. If the field is correctly populated and in a supported format, it should help results improve; however it can badly harm results if the field contains values that analytics cannot understand or misinterprets.
  • Use email header fields - Turning this off forces analytics to ignore the six fields that you are passing in, and forces it to use the extracted text to determine the top-most email headers.
  • Email header languages - You can try experimenting with different languages to see if the selections affect the results.
  • Regular expression filter - Listed under Advanced Settings on the Structured Analytics Set, regular expressions can be used to modify the extracted text while it's being brought in for analysis. This is a powerful but complicated "weapon of last resort" when your extracted text is fundamentally corrupted with extraneous characters (often Bates numbers, extra lines, or page numbering) that cause the parsing mechanisms to become confused.

Implementing a fix

If you manage to find the source of the threading anomaly, you can switch your saved search on the Thread troubleshooting set to run on a larger set of documents. Once you are confident in your findings, you can then make the changes to your original structured analytics set.

References