

This topic explains several factors that may affect processing throughput, or performance for processing discover and publish jobs. If you consistently experience lower-than-usual throughput, check to see if any of the following factors are impacting job performance.
Processing discover throughput is defined as the amount of concurrent processing work performed on a customer’s RelativityOne instance in a given time period. Processing discovery throughput is expressed in gigabytes per hour (GB/hr) and is calculated by adding the size of the data discovered from all of the processing jobs running on the instance, then dividing by the amount of time for the data to process, when at least one processing job was running.
For instance, in the image below, between 1 PM and 4 PM, there are three processing jobs running and 240 GB (80+120+40) of data is discovered in 2.5 hours (3 hours - 0.5 hours when no processing job was running). So, the processing throughput is 240/2.5 = 96 GB/hr.
The following content discusses common factors that may affect discover job throughput.
The discovered size of the dataset can be larger than the raw data size for compressed files.
Compressed files such as .pst, .zip, .mbox, Google Workspace, or Lotus Notes have a discovered dataset size that is larger than the raw dataset after processing discovery completes. In general, e-discovery practitioners use an estimated 1.5 - 2 times the rate for data expansion. So, a 50 GB .pst files which processed at a speed of 50 GB/hr could expand to 100 GB after discovery. Throughput speeds can vary accordingly.
The discovered dataset size depends on the compression type used during data collection and preparation. You can change the compression method for many of these file types, so discovered sizes will differ greatly. To see a a table of supported container types, see Supported container files types.
The data within the raw dataset can affect the discovered data size. Multimedia audio, video files, large text files, and large images will compress well and then expand later leading to higher than 2 times the discovered data size.
Customers can compress an already compressed container into a zip and have a multiple-level deep compressed file that is easier to upload (a single large file instead of many small files).
During discovery, multiple levels of a deep compressed file can reduce processing speed as processing resources are spent to get to the files, rather than discovering them. You should balance the need to zip files with processing speed, and only have one-level deep containers when possible, or decompress files as much as possible before discovering.
You should limit the number of passwords stored in the password bank to 100. Relativity attempts each password for every encrypted document, which ultimately reduces discovery speed. For example, it takes approximately 0.4 seconds per password per file to decrypt and with 4000 passwords, your processing time can increase by more than 26 minutes. It takes 10 times as long for Relativity to try 1000 passwords then for 100 passwords.
Additionally, the type of document in the dataset can also affect a processing job with too many stored passwords. For example, a large encrypted Excel file takes longer to process than encrypted Word files.
Relativity distributes processing compute resources equally across all of the jobs in a workspace and across all of the workspaces in an instance. Individual job throughput can be affected when multiple jobs are running.
You can manually change the priority of a processing job from the Imaging and Processing Queue tab. Processing resources shift to the priority job.
When running multiple processing jobs concurrently, changing the priority level to a lower number (higher priority) results in all available resources focusing solely on that task until its completion, potentially delaying the progress of other tasks. Additionally, job priority is at the instance level. A higher-priority job in one workspace can affect the completion of a lower-priority job in another workspace.
For example, if you have the following:
Processing Set A: Priority 10
Processing Set B: Priority 10
Processing Set C: Priority 20
Processing capacity is spread equally on jobs A and B. Job C waits on these sets to complete before starting.
Large files over 10 GB process faster than smaller files less than 10 GB as there is more resource startup time involved with the processing of small files. If a dataset has a large number of files, the startup time on the files increases, making the dataset slower. You will see higher GB/hr throughput processing 10 files that are 10 GB each versus 10 files that are 10 KB each.
Generally speaking, a dataset with a low file count per GB of data is faster than a dataset with high file count per GB. For example, you will see higher GB/hr throughput processing 10 files that are 10 GB each versus 100 files than are 1 GB each. You can affect throughput of a processing job depending on how you bring in data within the dataset.
Certain file types such as large encrypted Excel files, deeply nested emails, or large 7-zips greater than 350 MB, .tar, .e01 and .l01 take longer to ingest data, compared to simpler file types such as email documents, thus affecting processing throughput. Similarly, text extraction is significantly slower with datasets containing large .pdf files needing OCR (when the system is performing OCR on them). Even 1-2 MB .pdf files, .png, .tiff, .jpeg, .cad, and other graphical image files can take several minutes to OCR if the embedded images have high resolution.
The following content discusses common factors that may affect publish job throughput.
Be aware of the number of mapped fields in your workspace. Having more than 100 fields mapped may affect the processing publish job performance.
Be sure that processing metadata fields are mapped to the right field type in Relativity. Mapping a metadata field such as Email Participants to a long text field or a choice field, when the field can have a large number of choices or values, will negatively impact publish performance.
Why was this not helpful?
Check one that applies.
Thank you for your feedback.
Want to tell us more?
Great!