Supported file types for processing

Relativity supports many different file types for processing. There are also a number of file types that are incompatible with the processing engine. Before you begin to process your data, it may be helpful to note which types are supported and unsupported, as well as any caveats involved with processing those types of files.

This page contains the following information:

Supported file types

The following file types and extensions are supported by Relativity for processing.

Note: Renaming a file extension has little effect on how Relativity identifies the file type. When processing a file type, Relativity looks at the actual file properties, such as digital signature, regardless of the named extension. Relativity only uses the named extension as a tie-breaker if the actual file properties indicate multiple extensions.

File type Extensions
Adobe files

PDF, FM, PS, EPS

  • XFA-based PDFs are unsupported; if you attempt to load one in the viewer after publishing it, you'll see the following message in the extracted text: "Please wait... If this message is not eventually replaced by the proper contents of the document, your PDF viewer may not be able to display this type of document."
  • Relativity performs OCR on PDF files during processing. Relativity handles a PDF portfolio, which is an integrated PDF unit containing multiple files, by extracting the metadata and associating it with the files contained in the portfolio.
AppleDouble AppleDouble-encoded attachments in e-mails
CAD files

DXF, DWG, SLDDRW, SLDPRT, 3DXML, SLDASM, PRTDOT, ASMDOT, DRWDOT, STL, EPRT, EASM, EDRW, EPRTX, EDRWX, EASMX

  • The OCR output for processed CAD files can vary significantly.
Compressed files 7Z, ZIP, TAR, GZ, BZ2, RAR, Z, CAB, ALZIP
Database files

DBF

  • Relativity only supports DBF 3 and DBF 4 files.
  • Relativity doesn't support the following DBF formats:
    • VisualFoxPro
    • VisualFoxPro autoincrement enabled
  • Relativity uses Microsoft Excel to extract text from DBF files. For details on DBF handling, see Excel file considerations.
Email

PST, OST, NSF, MSG, P7M, P7S, ICS, VCF, MBOX, EML, EMLX, TNEF, DBX, Bloomberg XML

  • The processing engine converts any EML files that are part of PST files to MSG's. Loose EML files are processed as EML files.
  • S/MIME-encrypted and digitally-signed emails are supported.
  • Even though the EMLX file type is supported, the following partial EMLX file extensions are not supported:
    • .emlxpart
    • .partial.emlx

EnCase versions:

  • 5.5
  • 5.6
  • 6
  • 7
  • 8

E01, Ex01, L01, LX01

  • Processing supports E01 and Ex01 files for the following operating and file systems:
    • Windows - NTFS, FAT, ExFAT
    • Mac - HFS+
    • Linux (Ubuntu)- EXT2, EXT3, EXT4
  • Deleted files that exist on an E01 and Ex01 (disk) image file are skipped during processing, with the exception of recycle bin items, which are processed with limited metadata.
  • Encrypted Encase files aren't supported. You must decrypt Encase files prior to processing them.
  • For details on E01 file handling, see Multi-part forensic file considerations.
Excel

XLSX, XLSM, XLSB, XLAM, XLTX, XLTM, XLS, XLT, XLA, XLM, XLW, UXDC

Note: If you save a Powerpoint or Excel document in pre-2007 format (e.g., .PPT or .XLS) and the document is read-only, we use the default known password to decrypt the document, regardless of whether or not the password exists in the Password Bank.

HTML

HTML, MHT, HTM, MHTML, XHTM, XHTML

  • Relativity extracts metadata and attachments from MIME file formats such as MHT and EML during processing.
Image files JPG, JPEG, ICO, BMP, GIF, TIFF, TIF, JNG, KOALA, LBM, PBM, IFF, PCD, PCX, PGM, PPM, RAS, TARGA, TGA, WBMP, PSD, CUT, XBM, DDS, FAX, SGI, PNG, EXF, EXIF, WEBP, WDP,
JungUm Global GUL
OneNote

ONE

  • Relativity uses Microsoft connectors to extract information from OneNote files at a page level. During ingestion, Relativity extracts embedded items from OneNote files and generates them as PDFs or TIFFs natively.
  • The Password Bank doesn't support OneNote files.
OpenOffice ODC, ODS, ODT, ODP, XPS
PowerPoint

PPTX, PPTM, PPSX, PPSM, POTX, POTM, PPT, PPS, POT

  • PowerPoint 97 through 2016 are supported, including the dual-format 95/97 version

Note: If you save a Powerpoint or Excel document in pre-2007 format (e.g., .PPT or .XLS) and the document is read-only, we use the default known password to decrypt the document, regardless of whether or not the password exists in the Password Bank.

Publisher PUB
Project MPP, MPT, MPD, MPX

Note: The text extracted from Project files is from the Gantt chart view and will include Task Notes.

Relativity Collection Container RCC
Short message

RSMF

Text files

TXT, CSV, and others

Note: Relativity Processing supports any file type whose underlying storage is ASCII or Unicode text and thus supports all text file types and extensions.

Vector files SVG, SVGZ, WMF, PLT, EMF, SNP, HPGL, HPG, PLO, PRN, EMZ, WMZ
Visio VSD, VDX, VSS, VSX, VST, VSW, VSDX, VSDM
Word

DOCX, DOCM, DOTX, DOTM, DOC, DOT, RTF

  • Word 2.0 through 2016 are supported, including templates
WordPerfect WPD, WPS

Note: Relativity currently doesn't support the extraction of embedded images or objects from Visio, Project, or OpenOffice files. In addition, Relativity never extracts any embedded objects or images that were added to any files as links. For a detailed list of the Office file extensions from which Relativity does and does not extract embedded objects and images, see Microsoft Office child extraction support.

Excel file considerations

Due to Excel specifications and limits, when processing a database file with the Native text extraction method, the DBF file may miss data in extracted text. For example, if a DBF file contains more than 1,048,576 rows and 16,384 columns, the extracted text of these files won’t contain text on row 1,048,577 and onward and on column 16,385 and onward. For more information, see Excel specifications and limits on the Microsoft website.

RSMF mapping considerations

Generally, Relativity maps all metadata based on EML, but the following RSMF-specific mappings are considered non-standard.

Note: All EML Header strings are case insensitive, which isn't unique to RSMF files.

EML Header Metadata Field

X-RSMF-BeginDate

Rsmf/BeginDate

EmailSentOn

CreatedOn

InternalCreatedOn

X-RSMF-EndDate

Rsmf/EndDate

LastModified

X-RSMF-EventCount

Rsmf/MessageCount

  • If an RSMF file doesn't include a Sent Date, and the X-RSMF-BeginDate header exists, that header will be mapped to the Sent Date field.
  • For more technical details on how Relativity Processing handles RSMF files, see Processing RSMF files.
  • Multi-part forensic file considerations

    When processing a multi-part forensic image, make sure that the Source location points to the root folder that must contain all files that make up the image. If you select only the first file of the image (E01, L01, EX01, LX01), inventory and discovery will fail with an unrecoverable error.

    This is due to the fact that inventory looks at files where they reside in the processing source folder and does not copy them to the repository. If only the first file is selected, during discovery that file and only that file will be copied to the repository and the workers will attempt to extract from it and fail since the rest of the archive is not available.

    Tracking inline/embedded images

    It may be helpful for you to understand when Relativity treats an image that is attached to an email as an inline, or embedded, image and not as an actual attachment. The following table breaks down when this occurs based on email format and image characteristics:

    Email format Attachments that are inline (embedded) images
    Plain text None
    Rich text IPicture-based OLE embedded images
    HTML
    • Images with content ID referenced in the HTML body
    • Local, non-internet image references in the HTML that Relativity can match to an attachment
    • PST/OST/MSG files containing metadata hints as to whether or not the image is marked hidden or is referenced in the HTML body

    You can arrange for the discovery of inline images when creating Processing profiles. If you discover inline images, Relativity marks them as hidden attachments in their metadata when it publishes them to your workspace. You can then search for them and take action on them separately.

    Native text extraction and OCR

    Relativity Processing distinguishes between text and line art in the documents you process. For these documents, processing will only OCR the line art. This means that Relativity doesn’t skip OCR if a page has electronic text.

    Accordingly, Relativity performs both native text extraction and OCR on the following file formats:

    • All image formats (TIFF/JPEG/GIF/BMP/PNG etc.)
    • All vector formats (SVG, CAD files, Metafiles [WMF, EMF], Postscript, Encapsulated postscript)
    • PDF, Visio, Publisher, MS Project, Hancom and JungUm files

    Support for password-protected RAR archives

    Relativity Processing doesn't decrypt a file that gets its encryption directly from the RAR file that contains it. This means that if you attempt to process a password-protected RAR archive on which the Encrypt file names property is checked, Relativity Processing is unable to extract the files inside that archive.

    Encrypt file names property checkbox

    In addition, note that Relativity Processing can extract a single password-protected file from a RAR archive, but not multiple password-protected files in the same archive.

    The following table breaks down Relativity Processing's support of password-protected RAR archives.

    • - Relativity Processing will decrypt the file.
    • Empty - Relativity Processing won't decrypt the file.
    Archive type Single password-protected file Multiple password-protected files Encrypt File Names property
    RAR    
    Multi-part RAR    

    MSG to MHT conversion considerations

    The following table provides details on the differences between how Relativity handles MSG and MHT files. This information may be especially useful if you plan on setting the Email Output field on the processing profile to MHT.

    Category Field/Attribute MSG MHT
    Metadata fields Show Time As This field sometimes appears in the extracted text from MSG files when not explicitly stated in the MSG file itself. The default for a calendar invite is to show time as "busy;" the default for a cancellation is to show time as "free." "Show Time As" will not appear in the extracted text if the default value is populated.
    Metadata fields On behalf of This field is sometimes present in text from MSG. In some cases, this field is populated with the same value as the From field. "On behalf of" will not appear in the extracted text.
    Interline spacing N/A

    The expected number of blank lines will appear in the extracted text. Line wrapping for long paragraphs will also be present.

    In some cases, the text in MHT format has fewer blank lines than the text from MSG. In addition, there is no built-in line wrapping for long paragraphs.
    Intraline spacing N/A

    White-space characters are converted to standard space characters.

    White-space characters may remain as non-breaking spaces.
    Character differences Smiley character as J Character sequences like ":)" are maintained in extracted text. Character sequences like ":)" are replaced by the character "J" in extracted text.

    Microsoft Office child extraction support

    Notable unsupported file types

    Processing doesn't support files created with the following programs and versions:

    Product category Product name and version

     

     

     

     

     

     

     

     

     

     

     

    DOS Word Processors

    DEC WPS Plus (DX) Through 4.0

    DEC WPS Plus (WPL) Through 4.1

    DisplayWrite 2 and 3 (TXT) All versions

    DisplayWrite 4 and 5 Through Release 2.0

    Enable 3.0, 4.0, and 4.5

    First Choice Through 3.0

    Framework 3.0

    IBM Writing Assistant 1.01

    Lotus Manuscript Version 2.0

    MASS11 Versions through 8.0

    MultiMate Versions through 4.0

    Navy DIF All versions

    Nota Bene Version 3.0

    Office Writer Versions 4.0 through 6.0

    PC-File Letter Versions through 5.0

    PC-File+ Letter Versions through 3.0

    PFS:Write Versions A, B, and C

    Professional Write Versions through 2.1

    Q&A Version 2.0

    Samna Word IV+ Versions through Samna Word

    SmartWare II Version 1.02

    Sprint Versions through 1.0

    Total Word Version 1.2

    Volkswriter 3 and 4 Versions through 1.0

    Wang PC (IWP) Versions through 2.6

    WordMARC Plus Versions through Composer

    WordStar Versions through 7.0

    WordStar 2000 Versions through 3.0

    XyWrite Versions through III Plus

     

     

     

     

     

    Windows Word Processors

    Adobe FrameMaker (MIF) Version 6.0

    Hangul Version 97, 2002

    JustSystems Ichitaro Versions 5.0, 6.0, 8.0, 13.0, 2004

    JustWrite Versions through 3.0

    Legacy Versions through 1.1

    Lotus AMI/AMI Professional Versions through 3.1

    Lotus Word Pro Millenium Versions 96 through Edition 9.6, text only

    Novell Perfect Works Version 2.0

    Professional Write Plus Version 1.0

    Q&A Write Version 3.0

    WordStar Version 1.0

    Mac Word Processors

    MacWrite II Version 1.1

    Disk Images

    Symantec Ghost

     

     

     

     

     

     

     

    Spreadsheets

    Enable Versions 3.0, 4.0, and 4.5

    First Choice Versions through 3.0

    Framework Version 3.0

    Lotus 1-2-3 (DOS and Windows) Versions through 5.0

    Lotus 1-2-3 (OS/2) Versions through 2.0

    Lotus 1-2-3 Charts (DOS and Windows) Versions through 5.0

    Lotus 1-2-3 for SmartSuite Versions 97 and Millennium 9.6

    Lotus Symphony Versions 1.0, 1.1, and 2.0

    Microsoft MultiPlan Version 4.0

    Mosaic Twin Version 2.5

    Novell Perfect Works Version 2.0

    PFS: Professional Plan Version 1.0

    Quattro Pro (DOS) Versions through 5.0

    Quattro Pro (Windows) Versions through 12.0, X3

    SmartWare II Version 1.02

    SuperCalc 5 Version 4.0

    VP Planner 3D Version 1.0

    In addition, processing doesn't support the following files:

    • Self-extracting RAR files
    • PEM certificate files
    • Apple i-Works suite (Pages, Numbers, Keynote)
    • Apple Mail:
      • .emlxpart
      • .partial.emlx

      Note: The .emlxpart and .partial.emlx are distinct from the .emlx file extension, which is supported by processing.

    • iCloud backup files
    • Microsoft Access
    • Microsoft Works
    • Raw partition files:
      • ISO
      • NTFS
      • HFS

    Supported container file types

    The following file types can act as containers:

    File type Extensions
    Bloomberg XML
    Cabinet

    CAB

    • We do not support multi-part CAB files.
    • We do not support Password Protected CAB files.
    EnCase E01, L01, LX01
    AccessData Logical Image

    AD1

    • We offer support for processing both single and multi-part non-encrypted AD1 files. For encrypted AD1 files, only single part files are supported. For multi-part AD1 files, you must decrypt the files prior to processing. See Lotus Notes considerations for more information.

    iCalendar

    ICS

    • For Outlook meeting invites, the email that is sent with the meeting invite (the MSG) will have a sent date that reflects when the sender sent out the meeting request. The resulting calendar file that is then added to the user's Outlook calendar (the ICS) will not include a sent date, as the date doesn’t apply to the calendar file itself.
    Lotus Notes Database

    NSF.

    MBOX Email Store

    MBOX

    • MBOX is a standard format, in which case it doesn't matter whether you're using a Mac folder format or a Unix file format.
    Outlook Offline Storage

    OST

    Outlook Mail Folder

    PST

    • Relativity assigns duplicate hash values to calendar invites, as it does with email messages and other documents.
    Outlook Express Mail Folder DBX
    PDF Portfolio PDF
    RAR

    RAR

    • You don't need to combine multipart RAR or TAR files before processing them.
    Relativity Collection container RCC
    TAR (Tape Archive)

    TAR

    • You don't need to combine multipart RAR or TAR files before processing them.
    Zip

    7Z

    • Relativity doesn't handle multipart ZIP files, which are extensions of WinZIP files. When dealing with archives, there is no limit to the number of layers deep Relativity Processing will go to extract data. It will extract until there is no more data to be extracted. Inventory, however, only extracts data from first-level documents. For example, you have a .ZIP within a .ZIP that contains an email with an attached Word document, inventory only extracts up to the email.
    Zip ALZIP
    Zip BZ2
    Zip GZ
    Zip ZIP
    Zip Z

    Lotus Notes considerations

    Note the following about how Relativity Processing handles NSF files:

    • Relativity Processing doesn't perform intermediate conversion on NSF files, meaning that we don't convert them to PST or DXL before discovering them. This ensures that we don't miss any document metadata during processing.
    • Relativity Processing preserves the original formatting and attachments of the NSF file. In addition, forms are not applied, since they are designed to hide information.
    • Relativity Processing extracts the contents of NSF files and puts them into individual MSG files using the Lotus Notes C/C++ API directly. This is because NSF doesn't have its own individual document entry file format. All of the original Lotus Notes metadata is embedded in the MSG, meaning if you look at the document metadata in an NSF within Lotus, all of the metadata listed is embedded in the MSG. In addition, the original RTF/HTML/Plaintext document body is written to the MSG. Relativity handles the conversion from NSF to MSG files itself, and any errors regarding metadata or the inability to translate content are logged to the processing Errors tab. Relativity can process the following NSF items as MSGs:
      • Contacts
      • Distribution lists
      • Calendar items
      • Emails and non-emails

    This is an example of an original NSF file before being submitted to the processing engine:

    Original NSF file

    This is an example of an NSF file that has been converted to an MSG:

    Converted MSG file

    Multi-part container considerations

    When processing a multi-part container, the first part of the container must be included. If the first part of the container is not included, the Processing engine will ignore the file.

    ICS/VCF file considerations

    ICS/VCF files are deduplicated not as emails but as loose files based on the SHA256 hash. Since the system now considers these loose files, Relativity is no longer capturing the email-specific metadata that it used to get as a result of ICS/VCF files going through the system's email handler.

    The following table breaks down which metadata values the system will populate for ICS files:

    The following table breaks down which metadata values the system will populate for VCF files:

    Container file types supported for the password bank

    The following container file types are supported by Relativity for Password Bank in Inventory.

    File type Extensions
    Lotus Notes Database NSF
    PDF Portfolio PDF
    PST PST
    RAR RAR
    Zip 7Z
    Zip ALZIP
    Zip ZIP
    Zip Z
    Zip BZ2
    Zip GZ

    Non-container file types supported for Password Bank in Inventory

    The Password Bank also supports the following non-container formats:

    • PDF
    • Excel*
    • Word*
    • PowerPoint*
    • S/MIME
    • P7M

    * Except DRM or custom encryption