Microsoft
This page covers additional supported file type (for processing) information for Microsoft. To view information on Excel, PowerPoint, and Word, see Supported file types for processing.
Outlook message item (.msg) to MIME encapsulation (.mht) conversion
The following table provides details on the differences between how Relativity handles .msg and .mht file types. This information may be especially useful if you plan on setting the Email Output field on the processing profile to MIME encapsulation.
Category | Field/attribute | Outlook message item (.msg) | MIME encapsultation (.mht) |
---|---|---|---|
Metadata fields | Show Time As | This field sometimes appears in the extracted text from MSG files when not explicitly stated in the message file itself. The default for a calendar invite is to show time as busy; the default for a cancellation is to show time as free. | Show Time As does not appear in the extracted text if the default value is populated. |
Metadata fields | On behalf of | This field is sometimes present in text from a message item. In some cases, this field is populated with the same value as the From field. | On behalf of does not appear in the extracted text. |
Interline spacing | N/A |
The expected number of blank lines appears in the extracted text. Line wrapping for long paragraphs will also be present. |
In some cases, the text in the .mht file format has fewer blank lines than the text from a message item. In addition, there is no built-in line wrapping for long paragraphs. |
Intraline spacing | N/A |
White-space characters are converted to standard space characters. |
White-space characters may remain as non-breaking spaces. |
Email addresses | When a message file is converted to .mht, the text is extracted from the .mht file using OutsideIn. This can lead to a loss of data. | If joe.smith@acme.com renders as Joe Smith in the .mht file, the email address is not captured in the extracted text. |
Email image extraction support
It is helpful to understand when Relativity treats an image that is attached to an email as an inline, or embedded, image and not as an actual attachment. The following table breaks down when this occurs based on email format and image characteristics:
Email format | Attachments that are inline, embedded, images |
---|---|
Plain text | None |
Rich text | IPicture-based OLE embedded images |
HTML |
|
You can arrange for the discovery of inline images when creating Processing profiles, specifically through the field called When extracting children, do not extract.
Inline image identification
Processing defines inline images within emails through the HiddenAttachment field. This field is not mapped by default. See Mapping processing fields for more information.
Microsoft Office child extraction support
The following table displays which Office file extensions will have their embedded objects and images extracted by Relativity and which will not.
- √—Relativity fully extracts the embedded object and image.
- √*—Relativity partially extracts the embedded object or image.
- Empty—Relativity does not extract the embedded object or image.
Office program | File extension | Embedded object extraction | Embedded image extraction |
---|---|---|---|
Excel | .xlsx | √ | √ |
Excel | .xlsm | √ | √ |
Excel | .xlsb | √ | √ |
Excel | .xlam | √ | √ |
Excel | .xltx | √ | √ |
Excel | .xltm | √ | √ |
Excel | .xls | √ | √* |
Excel | .xlt | √ | √* |
Excel | .xla | √ | √* |
Excel | .xlm | √ | √* |
Excel | .xlw | √ | √* |
Excel | .uxdc | ||
Outlook | .msg | √ | √ |
Word | .docx | √ | √ |
Word | .docm | √ | √ |
Word | .dotx | √ | √ |
Word | .dotm | √ | √ |
Word | .doc | √ | √* |
Word | .dot | √ | √* |
Word | .rtf | √ | √ |
Visio | .vsd | ||
Visio | .vdx | ||
Visio | .vss | ||
Visio | .vsx | ||
Visio | .vst | ||
Visio | .vsw | ||
Visio | .vsdx | √ | √ |
Visio | .vsdm | √ | √ |
Project | .mpp | ||
Publisher | .pub | √ | |
PowerPoint | .pptx | √ | √ |
PowerPoint | .pptm | √ | √ |
PowerPoint | .ppsx | √ | √ |
PowerPoint | .ppsm | √ | √ |
PowerPoint | .potx | √ | √ |
PowerPoint | .ppt | √ | √ |
PowerPoint | .pps | √ | √ |
PowerPoint | .pot | √ | √ |
OneNote | .one | √ |
Excel files
Due to Excel specifications and limits, when processing a database file with the native text extraction method, the database file may miss data in extracted text. For example, if a database file contains more than 1,048,576 rows and 16,384 columns, the extracted text of these files will not contain text on row 1,048,577 and onward and on column 16,385 and onward. For more information, see Excel specifications and limits on the Microsoft website.