Supported file types for processing

Relativity supports many different file types for processing. There are also a number of file types that are incompatible with the processing engine. Before you begin to process your data, it may be helpful to note which types are supported and unsupported, as well as any caveats involved with processing those types of files.

This page contains the following information:

Note: Data pulled from supported versus unsupported file types: Relativity only pulls limited metadata from unsupported file types. Data pulled from supported file types includes metadata, text, and embedded items.

Supported file types

The following file types and extensions are supported by Relativity for processing.

Note: Renaming a file extension has little effect on how Relativity identifies the file type. When processing a file type, Relativity looks at the actual file properties, such as digital signature, regardless of the named extension. Relativity only uses the named extension as a tie-breaker if the actual file properties indicate multiple extensions.

File type Extensions
Adobe files


  • Processing support for XFA PDF, PDF web forms, includes extraction of text, metadata, and imaging. Some workflows may require specific workarounds:

    • Native Redactions are not currently supported.

      • Workaround: Image the file first, then apply redactions.

    • The download PDF action, using the Natives file type, is not currently supported.

      • Workaround: After selecting the PDF mass action, use either the Original Images or Produced Images file type or export the files as PDFs using the Relativity Desktop Client (RDC.) For more information on using RDC, see Relativity Desktop Client.

  • Relativity performs OCR on PDF files during processing. Relativity handles a PDF portfolio, which is an integrated PDF unit containing multiple files, by extracting the metadata and associating it with the files contained in the portfolio.
AppleDouble AppleDouble-encoded attachments in e-mails
CAD files


  • The OCR output for processed CAD files can vary significantly.
Compressed files


ZIP containers do not store time zone information for CreatedOn, LastModified, and LastAccessed fields. When extracting files, time stamps are only meaningful if the time zone that the ZIP container was created in is known. Relativity extracts file metadata and updates the CreatedOn and LastModified fields if available. Otherwise, CreatedOn defaults to 1/1/1900 and LastModified reflects the worker local time zone. LastModified and LastAccessed fields will usually match.

Note: Relativity does not support multi-part ZIP, TAR, or 7Z files.

Database files


  • Relativity only supports DBF 3 and DBF 4 files.
  • Relativity does not support the following DBF formats:
    • VisualFoxPro
    • VisualFoxPro autoincrement enabled
  • Relativity uses Microsoft Excel to extract text from DBF files. For details on DBF handling, see Excel file considerations.


  • Original email EML data is parsed and stored inside a PST. If the email contains an embedded EML, the EML is also parsed and stored in the PST. The processing engine reads tables, properties, and rows to construct an MSG file from a PST. The MSG file format supports all rich metadata inside an email in a PST. The original EML data is not preserved.
  • S/MIME-encrypted and digitally-signed emails are supported.
  • Even though the EMLX file type is supported, the following partial EMLX file extensions are not supported:
    • .emlxpart
    • .partial.emlx

EnCase versions:

  • 5.5
  • 5.6
  • 6
  • 7
  • 8

E01, Ex01, L01, LX01

  • Processing supports E01 and Ex01 files for the following operating and file systems:
    • Windows—NTFS, FAT, ExFAT
    • Mac—HFS+
    • Linux (Ubuntu)- EXT2, EXT3, EXT4
  • Deleted files that exist on an E01 and Ex01 (disk) image file are skipped during processing, with the exception of recycle bin items, which are processed with limited metadata.
  • Encrypted Encase files are not supported. You must decrypt Encase files prior to processing them.
  • For details on E01 file handling, see Multi-part forensic file considerations.


Note: If you save a Powerpoint or Excel document in pre-2007 format, .PPT or .XLS files for example. and the document is read-only, we use the default known password to decrypt the document, regardless of whether or not the password exists in the Password Bank.

Google Workspace For details, see Google Workspace considerations.


  • Relativity extracts metadata and attachments from MIME file formats such as MHT and EML during processing.
JungUm Global GUL


  • Relativity uses Microsoft connectors to extract information from OneNote files at the section (or tab) level. All pages within a section are extracted as one file. During ingestion, Relativity extracts embedded items from OneNote files, and for some object types, generates them as PDFs or TIFFs natively.
  • The Password Bank does not support OneNote files.
  • RelativityOne does not support OneNote 2003 files.
OpenOffice ODC, ODS, ODT, ODP, XPS


  • PowerPoint 97 through the current product version is supported, including the dual-format 95/97 version.
  • Modern Comments are not currently supported for PowerPoint 2021+.

Note: If you save a Powerpoint or Excel document in pre-2007 format, .PPT or .XLS files for example. and the document is read-only, we use the default known password to decrypt the document, regardless of whether or not the password exists in the Password Bank.

Publisher PUB
Project MPP, MPT, MPD, MPX

Note: The text extracted from Project files is from the Gantt chart view and will include Task Notes.

Relativity Collection Container RCC
Short message


Text files

TXT, CSV, and others

Note: Processing supports any file type whose underlying storage is ASCII or Unicode text and thus supports all text file types and extensions.



  • Visio is a separate installation per the Worker Manager server page.

  • You must have Office 2013 or Office 2016 installed in order to process VSDX and VSDM file extensions.



Word 2.0 through the current product version is supported, including templates.

WordPerfect WPD, WPS

Note: Relativity currently does not support the extraction of embedded images or objects from Visio, Project, or OpenOffice files. In addition, Relativity never extracts any embedded objects or images that were added to any files as links. For a detailed list of the Office file extensions from which Relativity does and does not extract embedded objects and images, see Microsoft Office child extraction support.

Note: If you use the Native text extraction method on the profile, Processing does not handle pre-2008 Microsoft Office files that have the Protected view enabled. You must use the Relativity text extraction method to process these files.

Excel file considerations

Due to Excel specifications and limits, when processing a database file with the Native text extraction method, the DBF file may miss data in extracted text. For example, if a DBF file contains more than 1,048,576 rows and 16,384 columns, the extracted text of these files won’t contain text on row 1,048,577 and onward and on column 16,385 and onward. For more information, see Excel specifications and limits on the Microsoft website.

Google Workspace considerations

When you gather Google Workspace data, Google provides a supplemental metadata file. For Google Mail (Gmail), the metadata file is a CSV. For Google Drive, the metadata file is an XML.

Beginning in September 2021, Processing automatically identifies when the supplemental metadata file is present in your file share, links the metadata fields to the processed files, and makes them available as mappable fields.

There are two methods for collecting Google Workspace data:

  • Relativity Collect
  • Manual download via Google Vault

Relativity Collect

When using Relativity Collect to gather Gmail data, the metadata file is provided as a CSV, an XML file is also provided but not used. When using Collect to gather Google Drive data, the metadata file is provided as an XML.

Relativity Collect: Gmail Relativity Collect: Google Drive
Collect Gmail Files Collect Google Drive

Note: Relativity Collect places additional files in the root of the data source. They are processed as loose individual files if they are not removed prior to processing.

  • Google Chat data is converted to Relativity's short message format (RSMF) for processing when using Collect to gather data.
  • Do not edit the supplemental metadata files or they Processing will not recognize them.

Google Vault

When using Google Vault to gather Gmail data, download the [EXPORT_NAME].zip, [EXPORT_NAME]-metadata.csv, and [EXPORT_NAME]-metadata.xml files. For Google Drive data, download the [EXPORT_NAME].zip and [EXPORT_NAME]-metadata.xml files. When collecting data manually from Google Vault, you must export Gmail data in MBOX format. Relativity does not sync additional metadata if exported as a PST.

Google Vault: Gmail Google Vault: Google Drive
Google Vault Gmail Google Vault Drive

Note: If you have files saved in your root folder other than the supplemental metadata files, they are processed as loose individual files.

  • Google Chat data is not converted to Relativity's short message format (RSMF) when using Google Vault.
  • Do not edit the supplemental metadata files or Processing will not recognize them.

Google Workspace metadata field lists

The Google supplemental metadata files contain the fields listed below, all of which are available for mapping to Document object fields.

Note: Google may add, remove, or edit fields at any time. The most current lists can be found on their Vault export contents webpage.

The following table lists the metadata fields found in the Google Drive XML.

Google Drive XML Field Relativity Source
Field Name
DocID Google/DocID A unique identifier for the file. For site exports, the value is the page ID.
#Author Google/Author The email address of the person who owns the file in Drive. For a shared drive file, it shows the shared drive name.
Collaborators Google/Collaborators The accounts and groups that have direct permission to edit the file or add comments. Also includes users with indirect access to the file if you chose this option during export.
Viewers Google/Viewers The accounts and groups that have direct permission to view the file. Also includes users with indirect access to the file if you chose this option during export.
#DateCreated Google/DateCreated The date a Google file was created in Drive. For non-Google files, the date the file was uploaded to Drive.
#DateModified Google/DateModified The date the file was last modified.
#Title Google/Title The file name as assigned by the user. Because some operating systems cannot expand zip files with extremely long file names, Vault truncates the file name at 128 characters during export. The value shown by the #Title tag isn't truncated.
DocumentType Google/DocumentType

The file type for Google files. Possible values are:

  • DOCUMENT—a document created in Google Docs.
  • SPREADSHEET—a spreadsheet created in Google Sheets.
  • PRESENTATION—a presentation created in Google Slides.
  • FORM—a form created in Google Forms.
  • DRAWING—a drawing created in Google Drawings.
  • SITES_PAGE—a page from a site created in new Google Sites.
Others Google/Others The accounts from your query that have indirect access to the file if you opted to exclude access level information during export. May also include users for whom Vault could not determine permission levels at the time of export.
DocParentID Google/DocParentID For sites, the site ID.
SharedDriveID Google/SharedDriveID The identifier of the shared drive that contains the file, if applicable.
SourceHash Google/SourceHash A unique hash value for each version of a file. Can be used to deduplicate file exports and verify the exported file is an exact copy of the source file. Supported by Google Docs, Sheets, and Slides files only.
FileName Google/FileName The file name. Use this value to correlate the metadata with the file in the export ZIP file.
FileSize Google/FileSize The size of the file in bytes.
Hash Google/Hash The MD5 hash of the file.
UserQuery Google/UserQuery The query submitted by the Vault user that retrieved the files included in this export.
TimeZone Google/TimeZone The time zone used for date-based searches.
Custodians Google/Custodians The email addresses of the users whose accounts were searched. If you searched for content rather than individual user accounts, there are no custodians listed here.

The following table lists the metadata fields found in the Gmail CSV:

Google Mail
CSV Field
Relativity Source
Field Name
Description Notes
Rfc822MessageId Google/Rfc822MessageId A message ID that is the same for the receiver's and sender's messages. Use this value to correlate metadata with the message in an MBOX export. For classic Hangouts, the value is for the first message in the thread.  
GmailMessageId Google/GmailMessageId A unique message ID. Use this value to manage specific messages with the Gmail API. For classic Hangouts, the value is for the first message in the thread.  
Account Google/Account The account that had the message in their inbox. For example, received a message sent to because they are a member of the group. If a search returns that message because it was in user1's Inbox, then the value of To is while the value of Account is  
From Google/From The sender account.  
To Google/To The recipient account. Multiple recipients are comma-separated and the list is in double quotes. Gmail only
CC Google/CC Accounts in the cc: field. Gmail only
BCC Google/BCC Accounts in the bcc: field. Gmail only
Subject Google/Subject The message subject. Gmail only
Labels Google/Labels Labels applied to the message by Gmail or the user. Gmail only
DateSent Google/DateSent The message send date in UTC, yyyy-MM-dd'T'HH:mm:ssZZZZ. Gmail only
DateRecieved Google/DateRecieved The message received date, yyyy-MM-dd'T'HH:mm:ssZZZZ. Gmail only
SubjectAtStart Google/SubjectAtStart The subject of the conversation when the first message was sent. classic Hangouts only
SubjectAtEnd Google/SubjectAtEnd The subject of the conversation when the last message was sent. classic Hangouts only
DateFirstMessageSent Google/DateFirstMessageSent The time stamp for when the first message in a conversation was sent. classic Hangouts only
DateLastMessageSent Google/DateLastMessageSent The time stamp for when the last message in a conversation was sent. classic Hangouts only
DateFirstMessageReceived Google/DateFirstMessageReceived The time stamp for when the first message in a conversation was received. classic Hangouts only
DateLastMessageReceived Google/DateLastMessageReceived The time stamp for when the last message in a conversation was received. classic Hangouts only
ThreadedMessageCount Google/ThreadedMessageCount The number of messages in the conversation. classic Hangouts only

RSMF mapping considerations

Generally, Relativity maps all metadata based on EML, but the following RSMF-specific mappings are considered non-standard.

Note: All EML Header strings are case insensitive, which isn't unique to RSMF files.

EML Header Metadata Field(s)
  • Rsmf/BeginDate
  • EmailSentOn
  • CreatedOn
  • InternalCreatedOn
  • Rsmf/EndDate
  • LastModified
X-RSMF-EventCount Rsmf/MessageCount
  • If an RSMF file does not include a Sent Date, and the X-RSMF-BeginDate header exists, that header will be mapped to the Sent Date field.
  • For more technical details on how Processing handles RSMF files, see Processing an RSMF file.
  • Multi-part forensic file considerations

    When processing a multi-part forensic image, make sure that the Source location points to the root folder that must contain all files that make up the image. If you select only the first file of the image, E01, L01, EX01, LX01, inventory and discovery will fail with an unrecoverable error.

    This is due to the fact that inventory looks at files where they reside in the processing source folder and does not copy them to the repository. If only the first file is selected, during discovery that file and only that file will be copied to the repository and the workers will attempt to extract from it and fail since the rest of the archive is not available.

    When processing E01 files, the following NTFS file system files are skipped:

    • Unallocated space files
    • Index $I30 files
    • $TXF_DATE files

    Native text extraction and OCR

    Processing distinguishes between text and line art in the documents you process. For these documents, processing will only OCR the line art. This means that Relativity does not skip OCR if a page has electronic text.

    Accordingly, Relativity performs both native text extraction and OCR on the following file formats:

    • All vector formats—SVG, CAD files, Metafiles [WMF, EMF], Postscript, Encapsulated postscript
    • PDF, Visio, Publisher, MS Project, Hancom and JungUm files

    All image formats, TIFF/JPEG/GIF/BMP/PNG and more, do not have native text, so only OCR is performed. If the file has electronic text and images, native text extraction and OCR will be performed.

    Support for password-protected RAR archives

    Processing does not decrypt a file that gets its encryption directly from the RAR file that contains it. This means that if you attempt to process a password-protected RAR archive on which the Encrypt file names property is checked, Processing is unable to extract the files inside that archive.

    Encrypt file names property checkbox

    In addition, note that Processing can extract a single password-protected file from a RAR archive, but not multiple password-protected files in the same archive.

    The following table breaks down Processing's support of password-protected RAR archives.

    • —Processing will decrypt the file.
    • Empty—Processing will not decrypt the file.
    Archive type Single password-protected file Multiple password-protected files Encrypt File Names property
    Multi-part RAR    

    MSG to MHT conversion considerations

    The following table provides details on the differences between how Relativity handles MSG and MHT files. This information may be especially useful if you plan on setting the Email Output field on the processing profile to MHT.

    Category Field/Attribute MSG MHT
    Metadata fields Show Time As This field sometimes appears in the extracted text from MSG files when not explicitly stated in the MSG file itself. The default for a calendar invite is to show time as busy; the default for a cancellation is to show time as free. Show Time As does not appear in the extracted text if the default value is populated.
    Metadata fields On behalf of This field is sometimes present in text from MSG. In some cases, this field is populated with the same value as the From field. On behalf of does not appear in the extracted text.
    Interline spacing N/A

    The expected number of blank lines will appear in the extracted text. Line wrapping for long paragraphs will also be present.

    In some cases, the text in MHT format has fewer blank lines than the text from MSG. In addition, there is no built-in line wrapping for long paragraphs.
    Intraline spacing N/A

    White-space characters are converted to standard space characters.

    White-space characters may remain as non-breaking spaces.
    Email addresses Email When an MSG is converted to MHT, the text is extracted from the MHT using OutsideIn. This can lead to a loss of data. If renders as Joe Smith in the MHT, the email address is not captured in the extracted text.

    Email image extraction support

    It may be helpful for you to understand when Relativity treats an image that is attached to an email as an inline, or embedded, image and not as an actual attachment. The following table breaks down when this occurs based on email format and image characteristics:

    Email format Attachments that are inline, embedded, images
    Plain text None
    Rich text IPicture-based OLE embedded images
    • Images with content ID referenced in the HTML body
    • Local, non-internet image references in the HTML that Relativity can match to an attachment
    • PST/OST/MSG files containing metadata hints as to whether or not the image is marked hidden or is referenced in the HTML body

    You can arrange for the discovery of inline images when creating Processing profiles, specifically through the field called When extracting children, do not extract.

    Inline image identification

    Relativity Processing defines inline images within emails through the HiddenAttachment field. This field is not mapped by default. See Mapping processing fields for more information.

    Microsoft Office child extraction support

    Notable unsupported file types

    Processing does not support files created with the following programs and versions:

    Product category Product name and version
    DOS Word Processors
    • DEC WPS Plus (DX) Through 4.0
    • DEC WPS Plus (WPL) Through 4.1
    • DisplayWrite 2 and 3 (TXT) All versions
    • DisplayWrite 4 and 5 Through Release 2.0
    • Enable 3.0, 4.0, and 4.5
    • First Choice Through 3.0
    • Framework 3.0
    • IBM Writing Assistant 1.01
    • Lotus Manuscript Version 2.0
    • MASS11 Versions through 8.0
    • MultiMate Versions through 4.0
    • Navy DIF All versions
    • Nota Bene Version 3.0
    • Office Writer Versions 4.0 through 6.0
    • PC-File Letter Versions through 5.0
    • PC-File+ Letter Versions through 3.0
    • PFS:Write Versions A, B, and C
    • Professional Write Versions through 2.1
    • Q&A Version 2.0
    • Samna Word IV+ Versions through Samna Word
    • SmartWare II Version 1.02
    • Sprint Versions through 1.0
    • Total Word Version 1.2
    • Volkswriter 3 and 4 Versions through 1.0
    • Wang PC (IWP) Versions through 2.6
    • WordMARC Plus Versions through Composer
    • WordStar Versions through 7.0
    • WordStar 2000 Versions through 3.0
    • XyWrite Versions through III Plus
    Windows Word Processors
    • Adobe FrameMaker (MIF) Version 6.0
    • Hangul Version 97, 2002
    • JustSystems Ichitaro Versions 5.0, 6.0, 8.0, 13.0, 2004
    • JustWrite Versions through 3.0
    • Legacy Versions through 1.1
    • Lotus AMI/AMI Professional Versions through 3.1
    • Lotus Word Pro Millenium Versions 96 through Edition 9.6, text only
    • Novell Perfect Works Version 2.0
    • Professional Write Plus Version 1.0
    • Q&A Write Version 3.0
    • WordStar Version 1.0
    Mac Word Processors MacWrite II Version 1.1
    Disk Images Symantec Ghost
    • Enable Versions 3.0, 4.0, and 4.5
    • First Choice Versions through 3.0
    • Framework Version 3.0
    • Lotus 1-2-3 (DOS and Windows) Versions through 5.0
    • Lotus 1-2-3 (OS/2) Versions through 2.0
    • Lotus 1-2-3 Charts (DOS and Windows) Versions through 5.0
    • Lotus 1-2-3 for SmartSuite Versions 97 and Millennium 9.6
    • Lotus Symphony Versions 1.0, 1.1, and 2.0
    • Microsoft MultiPlan Version 4.0
    • Mosaic Twin Version 2.5
    • Novell Perfect Works Version 2.0
    • PFS: Professional Plan Version 1.0
    • Quattro Pro (DOS) Versions through 5.0
    • Quattro Pro (Windows) Versions through 12.0, X3
    • SmartWare II Version 1.02
    • SuperCalc 5 Version 4.0
    • VP Planner 3D Version 1.0

    In addition, processing does not support the following files:

    • Self-extracting RAR files
    • PEM certificate files
    • Apple i-Works suite (Pages, Numbers, Keynote)
    • Apple Mail:
      • .emlxpart
      • .partial.emlx

      Note: The .emlxpart and .partial.emlx are distinct from the .emlx file extension, which is supported by processing.

    • Audio/Video files
      • .wav
    • iCloud backup files
    • Microsoft Access
    • Microsoft Works
    • Raw partition files:
      • ISO
      • NTFS
      • HFS

    Note: For information on the limitations and exceptions to our supported file types, see Supported file types.

    Supported container file types

    The following file types can act as containers:

    File type Extensions


    • We do not support Instant Bloomberg XML files.


    • We do not support multi-part CAB files.
    • We do not support Password Protected CAB files.
    Compressed files


    When working with archives, there is no limit to the number of layers deep Processing goes to extract data. It extracts until there is no more data to be extracted. Inventory, however, only extracts data from first-level documents. For example, you have a .ZIP within a .ZIP that contains an email with an attached Word document, inventory only extracts up to the email.

    Note: Relativity does not support multi-part ZIP,TAR, or 7Z files.

    EnCase E01, L01, LX01, EX01
    AccessData Logical Image


    • We offer support for processing both single and multi-part non-encrypted AD1 files. For encrypted AD1 files, only single part files are supported. For multi-part AD1 files, you must decrypt the files prior to processing. See Multi-part container considerations for more information.



    • For Outlook meeting invites, the email that is sent with the meeting invite (the MSG) will have a sent date that reflects when the sender sent out the meeting request. The resulting calendar file that is then added to the user's Outlook calendar (the ICS) will not include a sent date, as the date doesn’t apply to the calendar file itself.
    Lotus Notes Database


    MBOX Email Store


    • MBOX is a standard format, in which case it does not matter whether you're using a Mac folder format or a Unix file format.
    Outlook Offline Storage


    Outlook Mail Folder


    • Relativity assigns duplicate hash values to calendar invites, as it does with email messages and other documents.
    Outlook Express Mail Folder DBX
    PDF Portfolio PDF


    • You do not need to combine multipart RAR files before processing them.
    Relativity Collection container RCC
    TAR (Tape Archive)


    • Relativity does not handle multi-part TAR files.


    See Compressed files.

    Lotus Notes considerations

    Note the following about how Processing handles NSF files:

    • Processing does not perform intermediate conversion on NSF files, meaning that we do not convert them to PST or DXL before discovering them. This ensures that we do not miss any document metadata during processing.
    • Processing preserves the original formatting and attachments of the NSF file. In addition, forms are not applied, since they are designed to hide information.
    • Processing extracts the contents of NSF files and puts them into individual MSG files using the Lotus Notes C/C++ API directly. This is because NSF does not have its own individual document entry file format. All of the original Lotus Notes metadata is embedded in the MSG, meaning if you look at the document metadata in an NSF within Lotus, all of the metadata listed is embedded in the MSG. In addition, the original RTF/HTML/Plaintext document body is written to the MSG. Relativity handles the conversion from NSF to MSG files itself, and any errors regarding metadata or the inability to translate content are logged to the processing Errors tab. Relativity can process the following NSF items as MSGs:
      • Contacts
      • Distribution lists
      • Calendar items
      • Emails and non-emails

    This is an example of an original NSF file before being submitted to the processing engine:

    Original NSF file

    This is an example of an NSF file that has been converted to an MSG:

    Converted MSG file

    Multi-part container considerations

    When processing a multi-part container, the first part of the container must be included. If the first part of the container is not included, the Processing engine will ignore the file.

    ICS/VCF file considerations

    ICS/VCF files are deduplicated not as emails but as loose files based on the SHA256 hash. Since the system now considers these loose files, Relativity is no longer capturing the email-specific metadata that it used to get as a result of ICS/VCF files going through the system's email handler.

    The following table breaks down which metadata values the system will populate for ICS files:

    The following table breaks down which metadata values the system will populate for VCF files:

    Container file types supported for the password bank

    The following container file types are supported by Relativity for Password Bank in Inventory.

    File type Extensions
    Compressed files 7Z, ALZIP, ZIP, Z, BZ2, GZ
    Lotus Notes Database NSF
    PDF Portfolio PDF

    Non-container file types supported for Password Bank in Inventory

    The Password Bank also supports the following non-container formats:

    • PDF
    • Excel*
    • Word*
    • PowerPoint*
    • S/MIME
    • P7M

    * Except DRM or custom encryption