Supported file types for processing

Relativity supports many different file types for processing. There are also a number of file types that are incompatible with the processing engine. Before you begin to process your data, it may be helpful to note which types are supported and unsupported, as well as any caveats involved with processing those types of files.

This page contains the following information:

Supported file types

The following file types and extensions are supported by Relativity for processing.

Note: Renaming a file extension has little effect on how Relativity identifies the file type. When processing a file type, Relativity looks at the actual file properties, such as digital signature, regardless of the named extension. Relativity only uses the named extension as a tie-breaker if the actual file properties indicate multiple extensions.

File type Extensions
Adobe files

PDF, FM, PS, EPS

  • XFA-based PDFs are unsupported; if you attempt to load one in the viewer after publishing it, you'll see the following message in the extracted text: "Please wait... If this message is not eventually replaced by the proper contents of the document, your PDF viewer may not be able to display this type of document."
  • Relativity performs OCR on PDF files during processing. Relativity handles a PDF portfolio, which is an integrated PDF unit containing multiple files, by extracting the metadata and associating it with the files contained in the portfolio.
AppleDouble AppleDouble-encoded attachments in e-mails
CAD files

DXF, DWG, SLDDRW, SLDPRT, 3DXML, SLDASM, PRTDOT, ASMDOT, DRWDOT, STL, EPRT, EASM, EDRW, EPRTX, EDRWX, EASMX

  • For processing and imaging data sets containing CAD files, you can configure the timeout value in the AppSettings table. See AppSettings table. The OCR output for processed CAD files can vary significantly.
Compressed files 7Z, ZIP, TAR, GZ, BZ2, RAR, Z, CAB, ALZIP
Database files

DBF

  • Relativity only supports DBF 3 and DBF 4 files.
  • Relativity doesn't support the following DBF formats:
    • VisualFoxPro
    • VisualFoxPro autoincrement enabled
  • Relativity uses dtSearch to extract text from DBF files. If dtSearch fails, Relativity uses Microsoft Excel.
Email

PST, OST, NSF, MSG, P7M, P7S, ICS, VCF, MBOX, EML, EMLX, TNEF, DBX, Bloomberg XML

  • The processing engine converts any EML files that are part of PST files to MSG's. Loose EML files are processed as EML files.
  • S/MIME-encrypted and digitally-signed emails are supported.
  • Even though the EMLX file type is supported, the following partial EMLX file extensions are not supported:
    • .emlxpart
    • .partial.emlx

EnCase versions:

  • 5.5
  • 5.6
  • 6
  • 7
  • 8

E01, Ex01, L01, LX01

  • Processing supports E01 and Ex01 files for the following operating and file systems:
    • Windows - NTFS, FAT, ExFAT
    • Mac - HFS+
    • Linux (Ubuntu)- EXT2, EXT3, EXT4
  • Deleted files that exist on an E01 and Ex01 (disk) image file are skipped during processing, with the exception of recycle bin items, which are processed with limited metadata.
  • Encrypted Encase files aren't supported. You must decrypt Encase files prior to processing them.
  • Support for EnCase version 8 was added in Relativity 9.6.202.10.
  • For details on E01 file handling, see Multi-part forensic file considerations.
Excel

XLSX, XLSM, XLSB, XLAM, XLTX, XLTM, XLS, XLT, XLA, XLM, XLW, UXDC

  • Excel 2.0 through 2016 are supported

Note: If you save a Powerpoint or Excel document in pre-2007 format (e.g., .PPT or .XLS) and the document is read-only, we use the default known password to decrypt the document, regardless of whether or not the password exists in the Password Bank.

Hangul Word Processor

HWP

  • There currently is no password protection support for HWP. In order to process or image these files, download the Hancom Office Hanword 2014 viewer and install to a processing worker server. This is not a required application install for processing. It's only required if you intend to process or image HWP files.
HTML

HTML, MHT, HTM, MHTML, XHTM, XHTML

  • Relativity extracts metadata and attachments from MIME file formats such as MHT and EML during processing.
Image files JPG, JPEG, ICO, BMP, GIF, TIFF, TIF, JNG, KOALA, LBM, PBM, IFF, PCD, PCX, PGM, PPM, RAS, TARGA, TGA, WBMP, PSD, CUT, XBM, DDS, FAX, SGI, PNG, EXF, EXIF, WEBP, WDP,
JungUm Global GUL
OneNote

ONE

  • Relativity uses Microsoft connectors to extract information from OneNote files at a page level. During ingestion, Relativity extracts embedded items from OneNote files and generates them as PDFs or TIFFs natively.
  • The Password Bank doesn't support OneNote files.
OpenOffice ODC, ODS, ODT, ODP, XPS
PowerPoint

PPTX, PPTM, PPSX, PPSM, POTX, POTM, PPT, PPS, POT

  • PowerPoint 97 through 2016 are supported, including the dual-format 95/97 version

Note: If you save a Powerpoint or Excel document in pre-2007 format (e.g., .PPT or .XLS) and the document is read-only, we use the default known password to decrypt the document, regardless of whether or not the password exists in the Password Bank.

Publisher PUB
Project MPP
Relativity Collection Container RCC
Text files

TXT, CSV, and others

Note: Relativity Processing supports any file type whose underlying storage is ASCII or Unicode text and thus supports all text file types and extensions.

Vector files SVG, SVGZ, WMF, PLT, EMF, SNP, HPGL, HPG, PLO, PRN, EMZ, WMZ
Visio VSD, VDX, VSS, VSX, VST, VSW, VSDX, VSDM
  • You must have Office 2013 installed in order to process VSDX and VSDM file extensions
Word

DOCX, DOCM, DOTX, DOTM, DOC, DOT, RTF

  • Word 2.0 through 2016 are supported, including templates
WordPerfect WPD, WPS

Note: Relativity currently doesn't support the extraction of embedded images or objects from Visio, Project, or OpenOffice files. In addition, Relativity never extracts any embedded objects or images that were added to any files as links. For a detailed list of the Office file extensions from which Relativity does and does not extract embedded objects and images, see Microsoft Office child extraction support.

Multi-part forensic file considerations

When processing a multi-part forensic image, make sure that the Source location points to the root folder that must contain all files that make up the image. If you select only the first file of the image (E01, L01, EX01, LX01), inventory and discovery will fail with an unrecoverable error.

This is due to the fact that inventory looks at files where they reside in the processing source folder and does not copy them to the repository. If only the first file is selected, during discovery that file and only that file will be copied to the repository and the workers will attempt to extract from it and fail since the rest of the archive is not available.

Tracking inline/embedded images

It may be helpful for you to understand when Relativity treats an image that is attached to an email as an inline, or embedded, image and not as an actual attachment. The following table breaks down when this occurs based on email format and image characteristics:

Email format Attachments that are inline (embedded) images
Plain text None
Rich text IPicture-based OLE embedded images
HTML
  • Images with content ID referenced in the HTML body
  • Local, non-internet image references in the HTML that Relativity can match to an attachment
  • PST/OST/MSG files containing metadata hints as to whether or not the image is marked hidden or is referenced in the HTML body

You can arrange for the discovery of inline images when creating Processing profiles. If you discover inline images, Relativity marks them as hidden attachments in their metadata when it publishes them to your workspace. You can then search for them and take action on them separately.

Native text extraction and OCR

Relativity Processing distinguishes between text and line art in the documents you process. For these documents, processing will only OCR the line art. This means that Relativity doesn’t skip OCR if a page has electronic text.

Accordingly, Relativity performs both native text extraction and OCR on the following file formats:

  • All image formats (TIFF/JPEG/GIF/BMP/PNG etc.)
  • All vector formats (SVG, CAD files, Metafiles [WMF, EMF], Postscript, Encapsulated postscript)
  • PDF, Visio, Publisher, MS Project, Hancom and JungUm files

Support for password-protected RAR archives

Relativity Processing doesn't decrypt a file that gets its encryption directly from the RAR file that contains it. This means that if you attempt to process a password-protected RAR archive on which the Encrypt file names property is checked, Relativity Processing is unable to extract the files inside that archive.

In addition, note that Relativity Processing can extract a single password-protected file from a RAR archive, but not multiple password-protected files in the same archive.

The following table breaks down Relativity Processing's support of password-protected RAR archives.

  • - Relativity Processing will decrypt the file.
  • Empty - Relativity Processing won't decrypt the file.
Archive type Single password-protected file Multiple password-protected files Encrypt File Names property
RAR    
Multi-part RAR    

MSG to MHT conversion considerations

The following table provides details on the differences between how Relativity handles MSG and MHT files. This information may be especially useful if you plan on setting the Email Output field on the processing profile to MHT.

Category Field/Attribute MSG MHT
Metadata fields Show Time As This field sometimes appears in the extracted text from MSG files when not explicitly stated in the MSG file itself. The default for a calendar invite is to show time as "busy;" the default for a cancellation is to show time as "free." "Show Time As" will not appear in the extracted text if the default value is populated.
Metadata fields On behalf of This field is sometimes present in text from MSG. In some cases, this field is populated with the same value as the From field. "On behalf of" will not appear in the extracted text.
Interline spacing N/A

The expected number of blank lines will appear in the extracted text. Line wrapping for long paragraphs will also be present.

In some cases, the text in MHT format has fewer blank lines than the text from MSG. In addition, there is no built-in line wrapping for long paragraphs.
Intraline spacing N/A

White-space characters are converted to standard space characters.

White-space characters may remain as non-breaking spaces.
Character differences Smiley character as J Character sequences like ":)" are maintained in extracted text. Character sequences like ":)" are replaced by the character "J" in extracted text.

Microsoft Office child extraction support

Notable unsupported file types

Processing doesn't support files created with the following programs and versions:

Product category Product name and version

 

 

 

 

 

 

 

 

 

 

 

DOS Word Processors

DEC WPS Plus (DX) Through 4.0

DEC WPS Plus (WPL) Through 4.1

DisplayWrite 2 and 3 (TXT) All versions

DisplayWrite 4 and 5 Through Release 2.0

Enable 3.0, 4.0, and 4.5

First Choice Through 3.0

Framework 3.0

IBM Writing Assistant 1.01

Lotus Manuscript Version 2.0

MASS11 Versions through 8.0

MultiMate Versions through 4.0

Navy DIF All versions

Nota Bene Version 3.0

Office Writer Versions 4.0 through 6.0

PC-File Letter Versions through 5.0

PC-File+ Letter Versions through 3.0

PFS:Write Versions A, B, and C

Professional Write Versions through 2.1

Q&A Version 2.0

Samna Word IV+ Versions through Samna Word

SmartWare II Version 1.02

Sprint Versions through 1.0

Total Word Version 1.2

Volkswriter 3 and 4 Versions through 1.0

Wang PC (IWP) Versions through 2.6

WordMARC Plus Versions through Composer

WordStar Versions through 7.0

WordStar 2000 Versions through 3.0

XyWrite Versions through III Plus

 

 

 

 

 

Windows Word Processors

Adobe FrameMaker (MIF) Version 6.0

Hangul Version 97, 2002

JustSystems Ichitaro Versions 5.0, 6.0, 8.0, 13.0, 2004

JustWrite Versions through 3.0

Legacy Versions through 1.1

Lotus AMI/AMI Professional Versions through 3.1

Lotus Word Pro Millenium Versions 96 through Edition 9.6, text only

Novell Perfect Works Version 2.0

Professional Write Plus Version 1.0

Q&A Write Version 3.0

WordStar Version 1.0

Mac Word Processors

MacWrite II Version 1.1

Disk Images

Symantec Ghost

 

 

 

 

 

 

 

Spreadsheets

Enable Versions 3.0, 4.0, and 4.5

First Choice Versions through 3.0

Framework Version 3.0

Lotus 1-2-3 (DOS and Windows) Versions through 5.0

Lotus 1-2-3 (OS/2) Versions through 2.0

Lotus 1-2-3 Charts (DOS and Windows) Versions through 5.0

Lotus 1-2-3 for SmartSuite Versions 97 and Millennium 9.6

Lotus Symphony Versions 1.0, 1.1, and 2.0

Microsoft MultiPlan Version 4.0

Mosaic Twin Version 2.5

Novell Perfect Works Version 2.0

PFS: Professional Plan Version 1.0

Quattro Pro (DOS) Versions through 5.0

Quattro Pro (Windows) Versions through 12.0, X3

SmartWare II Version 1.02

SuperCalc 5 Version 4.0

VP Planner 3D Version 1.0

In addition, processing doesn't support the following files:

  • Self-extracting RAR files
  • PEM certificate files
  • Apple i-Works suite (Pages, Numbers, Keynote)
  • Apple Mail:
    • .emlxpart
    • .partial.emlx

    Note: The .emlxpart and .partial.emlx are distinct from the .emlx file extension, which is supported by processing.

  • iCloud backup files
  • Microsoft Access
  • Microsoft Works
  • Raw partition files:
    • ISO
    • NTFS
    • HFS

Supported container file types

The following file types can act as containers:

File type Extensions
Bloomberg XML
Cabinet CAB
EnCase E01, L01, LX01
AccessData Logical Image

AD1

  • We offer support for processing both single and multi-part non-encrypted AD1 files. For encrypted AD1 files, only single part files are supported. For multi-part AD1 files, you must decrypt the files prior to processing.

iCalendar

ICS

  • For Outlook meeting invites, the email that is sent with the meeting invite (the MSG) will have a sent date that reflects when the sender sent out the meeting request. The resulting calendar file that is then added to the user's Outlook calendar (the ICS) will not include a sent date, as the date doesn’t apply to the calendar file itself.
Lotus Notes Database

NSF.

MBOX Email Store

MBOX

  • MBOX is a standard format, in which case it doesn't matter whether you're using a Mac folder format or a Unix file format.
Outlook Offline Storage

OST

Outlook Mail Folder

PST

  • Relativity assigns duplicate hash values to calendar invites, as it does with email messages and other documents.
Outlook Express Mail Folder DBX
PDF Portfolio PDF
RAR

RAR

  • You don't need to combine multipart RAR or TAR files before processing them.
Relativity Collection container RCC
TAR (Tape Archive)

TAR

  • You don't need to combine multipart RAR or TAR files before processing them.
Zip

7Z

  • Relativity doesn't handle multipart ZIP files, which are extensions of WinZIP files. When dealing with archives, there is no limit to the number of layers deep Relativity Processing will go to extract data. It will extract until there is no more data to be extracted. Inventory, however, only extracts data from first-level documents. For example, you have a .ZIP within a .ZIP that contains an email with an attached Word document, inventory only extracts up to the email.
Zip ALZIP
Zip BZ2
Zip GZ
Zip ZIP
Zip Z

Lotus Notes considerations

Note the following about how Relativity Processing handles NSF files:

  • Relativity Processing doesn't perform intermediate conversion on NSF files, meaning that we don't convert them to PST or DXL before discovering them. This ensures that we don't miss any document metadata during processing.
  • Relativity Processing preserves the original formatting and attachments of the NSF file. In addition, forms are not applied, since they are designed to hide information.
  • Relativity Processing extracts the contents of NSF files and puts them into individual MSG files using the Lotus Notes C/C++ API directly. This is because NSF doesn't have its own individual document entry file format. All of the original Lotus Notes metadata is embedded in the MSG, meaning if you look at the document metadata in an NSF within Lotus, all of the metadata listed is embedded in the MSG. In addition, the original RTF/HTML/Plaintext document body is written to the MSG. Relativity handles the conversion from NSF to MSG files itself, and any errors regarding metadata or the inability to translate content are logged to the processing Errors tab. Relativity can process the following NSF items as MSGs:
    • Contacts
    • Distribution lists
    • Calendar items
    • Emails and non-emails

This is an example of an original NSF file before being submitted to the processing engine:

Original NSF file

This is an example of an NSF file that has been converted to an MSG:

Converted MSG file

ICS/VCF file considerations

ICS/VCF files are deduplicated not as emails but as loose files based on the SHA256 hash. Since the system now considers these loose files, Relativity is no longer capturing the email-specific metadata that it used to get as a result of ICS/VCF files going through the system's email handler.

The following table breaks down which metadata values the system will populate for ICS files:

The following table breaks down which metadata values the system will populate for VCF files:

Container file types supported for the password bank

The following container file types are supported by Relativity for Password Bank in Inventory.

File type Extensions
Lotus Notes Database NSF
PDF Portfolio PDF
PST PST
RAR RAR
Zip 7Z
Zip ALZIP
Zip ZIP
Zip Z
Zip BZ2
Zip GZ

Non-container file types supported for Password Bank in Inventory

The Password Bank also supports the following non-container formats:

  • PDF
  • Excel*
  • Word*
  • PowerPoint*
  • S/MIME
  • P7M

* Except DRM or custom encryption