Google Workspace
This page covers additional supported file type (for processing) information for Google Workspace. To view the Relativity supported and unsupported file type tables, see Supported file types for processing.
This documentation contains references to third-party software, or technologies. While efforts are made to keep third-party references updated, the images, documentation, or guidance in this topic may not accurately represent the current behavior or user interfaces of the third-party software. For more considerations regarding third-party software, such as copyright and ownership, see
Terms of Use.
When you gather Google Workspace data directly using Google Vault (Vault) or indirectly using Relativity Collect (Collect), the collection export includes zipped Google Gmail (Gmail) or Google Drive (Drive) data, along with supplemental metadata files.
Processing automatically identifies when the supplemental metadata file is present in your file share, links the metadata fields to the processed files, and makes them available as mappable fields.
There are two methods for collecting Google Workspace data:
- Collect in Relativity. For more information, see the Google Workspace documentation.
- Manual download through Google Vault.
Collect
When using Collect to gather Gmail data, the metadata file is provided as a .csv file; an .xml file is also provided but not used. When using Collect to gather Drive data, the metadata file is provided as an .xml file.
Relativity Collect: Gmail | Relativity Collect: Google Drive |
---|
| |
Collect places additional files in the root of the data source. They are processed as loose individual
files if they are not removed before processing.
Google Vault
When using Vault to gather Gmail data, download the [EXPORT_NAME].zip and [EXPORT_NAME]-metadata.csv files. For Drive data, download the [EXPORT_NAME].zip and [EXPORT_NAME]-metadata.xml files. When collecting data manually from Vault, you must export Gmail data in MBOX format. Relativity does not sync additional metadata if exported as a .pst file.
Google Vault: Gmail | Google Vault: Google Drive |
---|
| |
- If you have files saved in your root folder other than the supplemental metadata files, Relativity processes them as loose individual files.
- Do not unzip any zip containers included within the collection export. You may unintentionally alter the file system metadata of the contents, which may result in improper handling during processing.
- Do not edit the content or the names of the native files or supplemental metadata files. Doing so will result in improper handling during processing.
- When using Relativity Import/Export, make sure your folder structure meets Relativity's requirements. For information on structuring manual Google Vault exports for use with Import/Export, see Importing raw unprocessed data for Processing via Import/Export.
Other considerations
- Relativity converts Google Chat data to Relativity Short Message Format (RSMF) for processing when using Collect to capture Google Workspace Vault exports. If using Vault directly, Google chat data should be converted to RSMF before processing. Otherwise, this data is processed as standard MBOX email.
- Google Gemini data is converted to Relativity Short Message Format (RSMF) for processing when using Collect to capture Google Gemini Vault exports. If using Vault directly, Google Gemini data should be converted to RSMF before processing. Otherwise, this data is processed as standard XML data.
- Beyond those mentioned here, Google Vault collections can contain other data types. Some of these data types may exist within Google Drive or Gmail exports. For a complete list of these data types, see Google's documentation regarding Supported services & data types.
- Google Drive and Gmail metadata XML include information about data classification labels applied to the source data. Relativity does not capture this information.
- Google Drive and Gmail metadata XML include query parameters used to generate the Vault export. Relativity does not capture this information.
- Vault exports of Google Drive data include a file named [EXPORT_NAME]-custodian-docid.csv. Relativity does not capture any information in this file; it is processed as a loose individual file unless removed before processing.
- Vault exports of Gmail data may include an additional package of linked Drive files. Relativity processes both the Gmail and Drive data, but does not capture any relationship between them.
The Google supplemental metadata files contain the fields listed below, all of which are available for mapping to Document object fields.
Google may add, remove, or edit fields at any time. You can find the most current lists on their Vault export contents web page.
The following table lists the metadata fields found in the Google Drive .xml file. Use these fields for mapping Drive data.
Google drive .xml field | Relativity source field name | Field type | Description |
---|
DocID | GoogleDrive/DocID | Long Text | A unique identifier for the file. For site exports, the value is the page ID. |
#Author | GoogleDrive/Author | Long text | The email address of the person who owns the file in Drive. For a shared drive file, it shows the shared drive name. |
Collaborators | GoogleDrive/Collaborators | Multiple Choice | The accounts and groups that have direct permission to edit the file or add comments. Also includes users with indirect access to the file if you chose this option during export. |
Viewers | GoogleDrive/Viewers | Multiple Choice | The accounts and groups that have direct permission to view the file. Also includes users with indirect access to the file if you chose this option during export. |
#DateCreated | GoogleDrive/DateCreated | Date | The date a Google file was created in Drive. For non-Google files, the date the file was uploaded to Drive. |
#DateModified | GoogleDrive/DateModified | Date | The date the file was last modified. |
#Title | GoogleDrive/Title | Long Text | The file name as assigned by the user. Because some operating systems cannot expand zip files with extremely long file names, Vault truncates the file name at 128 characters during export. The value shown by the #Title tag isn't truncated. |
DocumentType | GoogleDrive/DocumentType | Long Text | The file type for Google files. Possible values are: - DOCUMENT—a document created in Google Docs.
- SPREADSHEET—a spreadsheet created in Google Sheets.
- PRESENTATION—a presentation created in Google Slides.
- FORM—a form created in Google Forms.
- DRAWING—a drawing created in Google Drawings.
- SITES_PAGE—a page from a site created in new Google Sites.
|
Others | GoogleDrive/Others | Multiple Choice | The accounts from your query that have indirect access to the file if you opted to exclude access level information during export. May also include users for whom Vault could not determine permission levels at the time of export. |
DocParentID | GoogleDrive/DocParentID | Long Text | For sites, the site ID. |
SharedDriveID | GoogleDrive/SharedDriveID | Long Text | The identifier of the shared drive that contains the file, if applicable. |
SourceHash | GoogleDrive/SourceHash | Long Text | A unique hash value for each version of a file. Can be used to deduplicate file exports and verify the exported file is an exact copy of the source file. Supported by Google Docs, Sheets, and Slides files only. |
FileName | GoogleDrive/FileName | Long Text | The file name. Use this value to correlate the metadata with the file in the export ZIP file. |
FileSize | GoogleDrive/FileSize | Whole Number | The size of the file in bytes. |
Hash | GoogleDrive/FileHash | Long Text | The MD5 hash of the file. |
- Upon export, Google Vault appends a DocID to the original name of Drive files so that it can be matched with Google Workspace metadata found in the .xml file. Google Vault may also truncate the original name portion to avoid file system errors when unzipping the Google Drive file package. The Google Vault file name is captured in the metadata .xml FileName tag, along with the original file name in the #Title tag. Relativity records the actual file name using the #Title value, but also captures the GoogleDrive FileName and GoogleDrive Title values.
- Google Vault may export Google Drive files that do not contain the original internal created or modified dates. This is especially true of Google Docs, Google Sheets, and Google Slides file exports. Relativity captures the actual created date and time, as well as the last modified date and time, using the #DateCreated and #DateModified values provided in the Google Workspace metadata .xml file. These dates may further be reflected in publish-calculated fields such as SortDate.
- In some scenarios, Google Vault may generate unique binary files for the same Google Drive file when it is exported multiple times. In other scenarios, Google Vault may generate duplicate binary files for the same Google Drive file when it is exported multiple times, but with different Google Workspace metadata, such as Title, Collaborator, and so forth. For deduplication purposes, Relativity always generates a standard SHA256 hash from the binary of non-email native files, including Google Drive files. As such, caution is advised when considering whether to perform deduplication of Google Vault Drive data during processing.
- Google Vault Drive exports contain a Google-calculated hash, exclusive to Google Docs, Google Sheets, and Google Slides, which is referenced in the metadata.xml file as SourceHash. This field is captured during processing and can be used post-processing, in conjunction with the Processing duplication workflow solution, to identify duplicate Google Docs, Google Sheets, and Google Slides files within the Relativity workspace.
The following table lists the metadata fields found in the Google Gmail .csv file. Use these fields for mapping Google Gmail data.
Google mail .csv field | Relativity source field name | Field type | Description | Notes |
---|
Rfc822MessageId | Google/Rfc822MessageId | Long Text | A message ID that is the same for the receiver's and sender's messages. Use this value to correlate metadata with the message in an MBOX export. For classic Hangouts, the value is for the first message in the thread. | |
GmailMessageId | Google/GmailMessageId | Long Text | A unique message ID. Use this value to manage specific messages with the Gmail API.
For classic Hangouts, the value is for the first message in the thread. | |
Account | Google/Account | Long Text | The account that had the message in their inbox. For example, user1@company.com received a message sent to groupA@company.com because they are a member of the group. If a search returns that message because it was in user1's Inbox, then the value of To is groupA@company.com while the value of Account is user1@company.com. | |
From | Google/From | Long Text | The sender account. | |
To | Google/To | Long Text | The recipient account. Multiple recipients are comma-separated and the list is in double quotes. | Gmail only |
CC | Google/CC | Multiple Choice | Accounts in the cc: field. | Gmail only |
BCC | Google/BCC | Multiple Choice | Accounts in the bcc: field. | Gmail only |
Subject | Google/Subject | Long Text | The message subject. | Gmail only |
Labels | Google/Labels | Multiple Choice | Labels applied to the message by Gmail or the user. | Gmail only |
DateSent | Google/DateSent | Date | The message send date in UTC, yyyy-MM-dd'T'HH:mm:ssZZZZ. | Gmail only |
DateRecieved | Google/DateRecieved | Date | The message received date, yyyy-MM-dd'T'HH:mm:ssZZZZ. | Gmail only |
SubjectAtStart | Google/SubjectAtStart | Long Text | The subject of the conversation when the first message was sent. | Classic Hangouts only |
SubjectAtEnd | Google/SubjectAtEnd | Long Text | The subject of the conversation when the last message was sent. | Classic Hangouts only |
DateFirstMessageSent | Google/DateFirstMessageSent | Date | The time stamp for when the first message in a conversation was sent. | Classic Hangouts only |
DateLastMessageSent | Google/DateLastMessageSent | Date | The time stamp for when the last message in a conversation was sent. | Classic Hangouts only |
DateFirstMessageReceived | Google/DateFirstMessageReceived | Date | The time stamp for when the first message in a conversation was received. | Classic Hangouts only |
DateLastMessageReceived | Google/DateLastMessageReceived | Date | The time stamp for when the last message in a conversation was received. | Classic Hangouts only |
ThreadedMessageCount | Google/ThreadedMessageCount | Decimal | The number of messages in the conversation. | Classic Hangouts only |