Processing Statistics

This Relativity system-level script provides you with the option of displaying:

  • A report of processed data sizes per processing set and user in all workspaces in the environment.
  • A summary of all processed data in all workspaces in the environment.

Category

This is a Relativity script that runs at the environment level. You must run this script while in Home mode, as it won't work from the workspace.

Special considerations

Note the following considerations regarding this script:

  • This script only reports on workspaces in which the processing application was successfully installed.
  • A processing set with no Data Source will not be displayed in the report.

Input and preparation

Enter the following information in the input fields on the script:

Processing script input fields

  • Date Begin - select a beginning date to report on. This will filter the results to display processing sets where the System Created On field falls within the selected date range.
  • Date End - select an end date to report on. This will filter the results to display processing sets where the System Created On field falls within the selected date range.
  • Display Type - select one of the following options:
    • Detailed - the report will display the results grouped by workspace and processing set.
    • Summary - the report will display the results grouped by workspace

Script results

The results of running the Processing Statistics script are displayed in the typical Relativity Script format.

When you execute the script, Relativity displays a report of all processing sets whose last system created on date falls within the selected date range. The report displays data for processing sets where the published status is any value.

Returned output - detailed display type

When you select Detailed for the Display Type field, the results window includes the following columns:

Processing script output showing default columns

Column headingDescription
WorkspaceThe name of the workspace.
Workspace Artifact IDThe artifact ID of the workspace.
Processing Set CreationThe date the processing set was created.
Processing Set Artifact ID The artifact ID of the processing set.
Last Modified DatesThe last date the processing set was modified.
Processing Data Source Count The count of Data Sources for the Processing set where the status is either Completed or Completed with Errors.
Discover StatusThe current status of the discovery phase of the set.
Inventory StatusThe current status of the inventory phase of the set.
Publish StatusThe current status of the publish phase of the set.
Preprocessed file countThe count of all native files before extraction/decompression, as they exist in storage.
Preprocessed file sizeThe sum of all the native file sizes, in GB, before extraction/decompression, as they exist in storage.
Container countThe count of all native files classified as containers before extraction/decompression, as they exist in storage. This also includes nested containers that haven’t been extracted yet.
Container sizeThe sum of all native file sizes, in GB, classified as containers before extraction/decompression, as they exist in storage. This value may be larger than the preprocessed file size because it also includes nested containers.
Inventoried filesThe count of all files found during an inventory run.
Nisted file countThe count of all files denisted out during discovery.
Nisted file sizeThe sum of all the file sizes, in GB, denisted out during discovery.
Filtered file countThe count of all files excluded from discovery by way of an exclusion filter after inventory.
Filtered file sizeThe sum of all the file sizes, in GB, excluded from discovery by way of an exclusion filter after inventory.
Discovered document The count of all the native files discovered that aren’t classified as containers as they exist in storage.
Discovered document sizeThe sum of all native file sizes discovered, in GB, that aren’t classified as containers as they exist in storage.
Duplicate file countThe count of duplicate native files associated to the user, processing set and workspace.
Duplicate file sizeThe sum of duplicate native file sizes, in GB, associated to the user, processing set and workspace.
Published documentsThe count of published native files associated to the processing set or workspace.
Published document sizeThe sum of published native file sizes, in GB, associated to the processing set or workspace.
Total file count after discoveryThe count of all native files (including duplicates and containers) as they exist after decompression and extraction.
Total file size after discoveryThe sum of all native file sizes (including duplicates and containers), in GB, as they exist after decompression and extraction.
Storage file size after publishingThe sum of all file sizes, in GB, as they exist in storage after publishing. This is the total file size minus any files that were ingested twice. This value only counts the file once, ignoring duplicates.

File size changes in script report

The following table provides a breakdown of how the values contained in the Processing Statistics script report can be expected to change as processing progresses. In the following example, a 5 GB PST file becomes 10 GB worth of emails and attachments.

Processing phaseColumn nameExpected value
None (not processed)Preprocessed file size5 GB
None (not processed)Container size5 GB (PST file + any containers)
DiscoveryDiscovered document size10 GB
PublishingPublished document size10 GB
After discovery & publishingTotal file size after discovery15 GB (10 GB in emails and attachments, 5 for the PST, without de-duping or filtering)
After discovery & publishingStorage file size after discovery/publishing10 GB

Returned output - summary display type

When you select Summary for the Display Type field, the results window includes the following columns:

Summary display types in processing stats

Column headingDescription
WorkspaceThe name of the workspace.
Workspace Artifact IDThe artifact ID of the workspace.
Processing Data Source Count The count of Data Sources for the Processing set where the status is either Completed or Completed with Errors.
Preprocessed file countThe count of all native files before extraction/decompression, as they exist in storage.
Preprocessed file sizeThe sum of all the native file sizes, in GB, before extraction/decompression, as they exist in storage.
Container countThe count of all native files classified as containers before extraction/decompression, as they exist in storage. This also includes nested containers that haven’t been extracted yet.
Container sizeThe sum of all native file sizes, in GB, classified as containers before extraction/decompression, as they exist in storage. This value may be larger than the preprocessed file size because it also includes nested containers.
Inventoried filesThe count of all files found during an inventory run.
Nisted file countThe count of all files denisted out during discovery.
Nisted file sizeThe sum of all the file sizes, in GB, denisted out during discovery.
Filtered file countThe count of all files excluded from discovery by way of an exclusion filter after inventory.
Filtered file sizeThe sum of all the file sizes, in GB, excluded from discovery by way of an exclusion filter after inventory.
Discovered document countThe count of all the native files discovered that aren’t classified as containers as they exist in storage.
Discovered document sizeThe sum of all native file sizes discovered, in GB, that aren’t classified as containers as they exist in storage.
Duplicate file countThe count of duplicate native files in the workspace.
Duplicate file sizeThe sum of duplicate native file sizes, in GB, in the workspace.
Published documentsThe count of published native files in the workspace.
Published document sizeThe sum of published native file sizes, in GB, in the workspace.
Total file count after discoveryThe count of all native files (including duplicates and containers) as they exist after decompression and extraction.
Total file size after discoveryThe sum of all native file sizes (including duplicates and containers), in GB, as they exist after decompression and extraction.
Storage file size after publishingThe sum of all file sizes, in GB, as they exist in storage after publishing. This is the total file size minus any files that were ingested twice. This value only counts the file once, ignoring duplicates.