Microsoft

This page covers additional supported file type (for processing) information for Microsoft. To view information on Excel, PowerPoint, and Word, see Supported file types for processing.

Outlook message item (.msg) to MIME encapsulation (.mht) conversion

The following table provides details on the differences between how Relativity handles .msg and .mht file types. This information may be especially useful if you plan on setting the Email Output field on the processing profile to MIME encapsulation.

Category Field/attribute Outlook message item (.msg) MIME encapsultation (.mht)
Metadata fields Show Time As This field sometimes appears in the extracted text from MSG files when not explicitly stated in the message file itself. The default for a calendar invite is to show time as busy; the default for a cancellation is to show time as free. Show Time As does not appear in the extracted text if the default value is populated.
Metadata fields On behalf of This field is sometimes present in text from a message item. In some cases, this field is populated with the same value as the From field. On behalf of does not appear in the extracted text.
Interline spacing N/A

The expected number of blank lines appears in the extracted text. Line wrapping for long paragraphs will also be present.

In some cases, the text in the .mht file format has fewer blank lines than the text from a message item. In addition, there is no built-in line wrapping for long paragraphs.
Intraline spacing N/A

White-space characters are converted to standard space characters.

White-space characters may remain as non-breaking spaces.
Email addresses Email When a message file is converted to .mht, the text is extracted from the .mht file using OutsideIn. This can lead to a loss of data. If joe.smith@acme.com renders as Joe Smith in the .mht file, the email address is not captured in the extracted text.

Email image extraction support

It is helpful to understand when Relativity treats an image that is attached to an email as an inline, or embedded, image and not as an actual attachment. The following table breaks down when this occurs based on email format and image characteristics:

Email format Attachments that are inline, embedded, images
Plain text None
Rich text IPicture-based OLE embedded images
HTML
  • Images with content ID referenced in the HTML body
  • Local, non-internet image references in the HTML that Relativity can match to an attachment
  • .pst/.ost/.msg files containing metadata hints as to whether or not the image is marked hidden or is referenced in the HTML body

You can arrange for the discovery of inline images when creating Processing profiles, specifically through the field called When extracting children, do not extract.

Inline image identification

Processing defines inline images within emails through the HiddenAttachment field. This field is not mapped by default. See Mapping processing fields for more information.

Microsoft Office child extraction support

Excel files

Due to Excel specifications and limits, when processing a database file with the native text extraction method, the database file may miss data in extracted text. For example, if a database file contains more than 1,048,576 rows and 16,384 columns, the extracted text of these files will not contain text on row 1,048,577 and onward and on column 16,385 and onward. For more information, see Excel specifications and limits on the Microsoft website.