Workflow solutions for very large workspaces

Very large Relativity workspaces (VLRWs) require a great deal of time and effort to maintain. As a result, it’s important to develop a plan to accommodate the workflow before your workspace reaches VLRW status. Making workflow changes after this point is inefficient and can be very time consuming.

Use the following best practices to plan your VLRW workflow and improve overall performance. These guidelines can also be a helpful starting point for educating Relativity users. This guide also outlines the best practices for setting up and working in VLRWs.

For more information on the management of very large and complex workspaces and engaging Relativity for assistance, visit Recommendations for managing very large workspaces in RelativityOne.

Workspace Administration and Optimization

Workspace size and activity factors

For information on migrating workspaces to RelativityOne, see Data Migration.

Fields

Too many or overly bloated fields can dramatically impact database performance. Use the following guidelines to improve field efficiency:

  • Keep fixed-length text fields to the minimum size needed.
  • Set fixed-length text fields to no more than 400 characters.
  • Store choice data in single-choice or multiple-choice fields if possible.
  • Use separate objects to store repetitive content when possible.
  • Set up fields as Unicode in advance because the system must re-index a field if the change is made following data load.
  • Only include fields in the text index when it’s necessary to do so. Alternatively, create a separate dtSearch index when you need to search fields.
  • Only create fields in large workspaces during off hours, as doing so during business hours can cause the Document table to lock, which causes performance issues.
Note: Creating additional Document object fields in large workspaces can disrupt review and coding for the duration of the field creation. You may see a warning message for large workspaces when this situation occurs, along with the number of active reviewers who may be impacted during field creation. You also have the option of canceling field creation and retrying the operation during off hours, which is especially useful in large workspaces. For more details, see Fields.
Field creation warning for large workspaces

Folders

You can use folders to organize documents to reflect their original storage. However, in a large workspace, the use of folders can cause performance issues. Folders perform searches in the background to display documents. If searches begin to take longer as more documents are loaded, consider the following options:

  • Combine folders and then use custom views to achieve similar organization.
  • Create a multiple-choice field and populate the field with the folder path. This will create an entry in the field tree to allow for organizing the documents by the original folder path and users can folder documents in multiple locations.

Index management

Optimized indexing requires some knowledge of your data. Scrubbing your data before indexing saves time when creating an index and returning search results.

Consider the following when creating an index:

  • Set the dtSearch index to recognize and/or ignore words, characters, and digits as necessary. While these settings don't necessarily impact performance, applying them ahead of time prevents you from having to rebuild the index later. For example, if a company name appears many times throughout a document set and you don’t intend to search for it, add this name to the noise words list. Configuring these settings before building a large index prevents you from having to rebuild the entire index later to include these types of characters.
  • Remove file types that have no searchable content, such as system or program files.
  • Use a separate index for searching database files and large Excel files. Even if your database has only a small number of these files, creating an index without them improves searching speed.
  • Set up multiple dtSearch indexes, including one with a smaller document set based on one or more of the following criteria. Note that editing these settings may affect search results.:
    • Date ranges
    • Custodians
    • Text size (extracted or OCR text)
      • Small (< 2 MB)
      • Medium (> 2 MB and < 10 MB)
      • Large (> 10 MB and < 25 MB)
      • Very large (> 25 MB)
  • Communicate with your team to create a search strategy for the case. Some cases have distinct words or terms that might warrant changing the default settings of an index.
  • Remove numbers from the dtSearch index alphabet file if you’re only searching for words—this reduces the size of the index and disables numeric range searching.
  • Communicate any changes to alphabet files to your team. Searching against multiple dtSearch indexes that use different alphabet files can result in different results, even when running the same search on identical content.
  • Enable dtSearch indexes to automatically recognize dates, email addresses, and credit card numbers only when necessary. Enabling this setting increases build time.
  • Consider using a pair of dtSearch indexes when adding new data. You can update one index in the background and then swap out the outdated index with the current one.
  • When using multiple dtSearch indexes, ensure the index names reflect the conditions of the index so users can choose the correct index to search against. Keep your indexes organized and disable and/or remove any indexes that are not needed.
  • Do not include Extracted Text Size as a view field when creating a saved search, as it may be indexed for potential false hits on numeric value searches.
  • Do not select Extracted Text Size as a sort field when creating a saved search, as doing so can cause poor index balancing, resulting in increased build times or slower search.

Automated Workflows

When building Automated Workflows in large workspaces, consider the configuration and timing of Triggers and Actions. It's recommended that you set up resource-intensive actions with the Delayed Run Time setting to run workflows during off-hours.

Search terms reports

Search terms reports (STRs) simplify the process of identifying documents that contain a specific group of keywords. Instead of running complicated queries, you can use the search terms report to enter a list of terms or phrases and then generate a report listing their frequencies within a set of documents. As workspaces grow in size, a search terms report takes longer to run if the individual string is too complicated.

In large workspaces, avoid using nested proximity searches or wildcards in search terms reports. Nested proximity searches run slowly in large dtSearch indexes because the search string takes longer to search the index. Using wildcards before a term, after a term, or on both sides of a term, causes the search terms report to take much more time to complete. Instead of using wildcards, use the dtSearch Dictionary to identify variations of a term.

Combining wildcards and nested proximity searching may create overly complicated searches. This adds a significant amount of time to running a query and sometimes prevents it from completing. For example, (((Term1* or Term2*) w/20 Term3*) and Term4*) and (Term5* w/20 Term6*) is a complicated query.

Persistent highlight sets

Persistent highlight sets provide a valuable way to identify terms within the document viewer. Although the size of a workspace doesn't affect how persistent highlighting works, use these guidelines to improve usability in large workspaces:

  • Enter multiple terms on separate lines.
  • Enter terms exactly as they appear in the document. Don't use quotation marks or connectors.
  • If you enter variations of a term or phrase and the variations include multiple words, list the multi-term variations first. The regular expressions that persistent highlighting uses look for and find a term, and then they move to the next term in the set. For example, you should list the terms United, United States, and United States of America, in the following order:
    • United States of America
    • United States
    • United
  • Don't use special characters, quotation marks, or other punctuation. Avoid using wildcards (*) and instead use stemming (~) or fuzzy searching (%) for spelling variations.
  • Don't use dtSearch syntax, including operators such as “AND” and “OR.”
  • Identify and remove terms with large hit counts.
  • List variations of a term first and the root term last.
  • Use Highlight Fields and Search Terms Reports to generate persistent highlight sets.
  • Avoid including STR fields on layouts, as this can negatively impact performance.
  • Turn off any Persistent Highlight Sets in the viewer that the user does not need for their review to improve document loading.

Layouts

You can use layouts, along with views, to improve workflow efficiency. Identify the type of information each reviewer group needs to code documents. (e.g., a group may be working on privilege logs, prepping for depositions, etc.). You can then use group security permissions to adjust layout visibility as necessary.

When planning layouts, think about the overall life-cycle of a document. For example, a review workflow may include the following:

  • First pass review
  • Quality assurance
  • Production
  • Privilege review
  • Deposition prep
  • Trial exhibits

Some users have many layouts that they need during the course of a document review. You can use separators (-----------) to help organize layouts and build the workflow.

Issue coding layouts can get long and cumbersome over time, requiring users to scroll to see all available choices. To improve the layout’s usability, limit the issue tags to broad categories, at least for first pass review, to reduce the number of choices and change the layout field display from check box list to pop-up picker. This de-clutters the layout space by minimizing and hiding the choices and presenting them only when necessary. Users can apply filtering to pop-up picker views to find choices.

Whenever possible, avoid creating multiple duplicative fields and instead, reuse existing fields on multiple layouts.  If you do not want a user to be able to change values on another layout, you can place the existing field on the additional layouts and make them read only.  This will help the reviewer needing to switch between layouts to see the information in previously populated fields, but not allow them to make changes to the coding.

Mass Operations

Some mass operations temporarily lock the document table while executing. Depending on the number of records and users in the system, the table may lock for an extended period of time and frustrate users trying to perform standard edits. If necessary, break searches for mass operations into smaller chunks and carry out mass operations at night or at an off-peak time.

Some mass operations, such as Mass Delete, have to do a considerable amount of clean up when deleting objects. Consider the complexity of the workspace (linked lists, fields, event handlers, etc.) when using the Mass Delete operation.

Security Permissions

Large workspaces usually require multiple security groups. You should organize documents and define security groups to assist with review workflow. Start with a baseline security group for each main role.

For example, you may need to create a baseline group for each of the following roles:

  • System admin
  • Operations admin
  • Operations technician
  • Project manager
  • Project specialist
  • Case admin
  • Case review
  • Case technician
  • Contract reviewer
  • Experts

Set security permissions from the baseline, giving each user group incremental security rights as necessary. For example, three different user groups may need permissions for the following tasks:

  • Contract review
  • Contract review – QA – Access to QA layouts, fields, choices
  • Contract review – Privilege – Access to Privilege Log layouts, fields, choices
  • Contract review – Dep Prep – Access to Deposition Prep layouts, fields, choices, mass actions

Snapshot auditing

Enabling Snapshot Auditing On Delete will record the value of every field on every document upon its deletion. This drastically increases the database size and significantly reduces the performance of Mass Delete. We recommend disabling this setting for Documents unless it is needed.

Views

When it comes to the reviewer interface, focus on workflow. Create views to filter document collections to necessary lists. Using views, you can manage the types of documents that are presented to a group. Use group security permissions to turn views toggle view visibility as necessary.

Analyze each group participating in the review, and map out its exact needs. For example, the First Pass group only needs to review batches. The Second Pass group needs to both review and quality check documents, and the Experts group only needs to see Production documents. Implementing a plan that coordinates views for review groups improves workspace management efficiency.

In addition, consider the following best practices for views:

  • Add only the fields necessary for the review task to a view.
  • Avoid adding Long Text fields to views.
  • Using nested saved searches as view conditions slows downloading.

Relativity system admins

System admins have all necessary permissions to perform the following script-related actions:

  • View
  • Run
  • Preview (locked and unlocked scripts)
  • Create/Write
  • Edit
  • Link Import Applications (See the Applications Guide)

Note: You can grant non-system admins admin permissions that relate to scripts. See Admin security for more information about granting admin permissions to non-system admins.

Users should not frequently run custom scripts that can have a negative impact on the system, including through SSMS access. Avoid using SSMS and the Admin Script functionality as much as possible until actions are audited and certain Relativity controls, including timeout values, are in place.

To prevent scripts from negatively impacting your environment:

  • Limit admin script access for a given workspace to one or two people.
  • Assign an individual to review the impact of custom script executions on the system.

Once you've identified the scripts safe for execution, you can make them available to users through the workspace tabs.

Ingestion, Data Management, Data Migration

Processing

Note the following details related to Inventory:

  • Consider creating smaller L01 files since very large L01 files can fail during Inventory.
  • Skip Inventory if you're not planning to use filters.

Note the following details related to Discover:

  • AD1 performs best out of the forensic containers.
  • Whenever possible, break up larger data sets (50 – 100 GB) into separate data sources, and create one processing set per data source. This allows for easier error identification and remediation without holding back or reprocessing the entire data set.
  • Enable Text Roll Up to make extracted images searchable and reduce document count in your workspace.
  • Avoid single large container files, if possible.
  • Use Relativity text extraction for best performance.
  • Do not exceed 100 Passwords in the Password Bank.
  • Resolve all discover errors before publishing.
  • Resolve errors on container files before moving to publish, if possible.

Note the following details related to Publish:

  • Run post-publish delete jobs off-hours.
  • Use simple saved searches to run the Processing duplication workflow scripts.

For information on the factors that may affect processing throughput, or performance for processing discover and publish jobs, see Processing throughput factors.

Importing load-ready data

To save time, consider the following recommendations when importing large amounts of load-ready data.

  • Break the load file into smaller document counts.
  • Load the control numbers and folder paths into the workspace to create the records in SQL.
  • Load text as external text files.
  • Break each subgroup load file into reasonable sizes, such as 250,000 records per load file.
  • Be aware of the kinds of data that you import. Importing the extracted text from large binary files, such as WAV, MPEG, and Access database file types cause excessive load. Don't attempt to load these file types.

Note: To more easily verify data and recognize errors, load in smaller batches of data and verify each batch as you go.

Relativity Integration Points

For large data promotion workflows, we recommend configuring integration point jobs into batches. The following suggestions can be used when creating job batches:

  • When using Tag Documents with Job Name option, the job document count should be no more than 500,000.
  • When you do not select Tag Documents with Job Name, no limit on document count applies.
  • The recommended number of fields to be mapped should not exceed 100. Additionally, it is best to map as few long text fields as possible.

RelativityOne Data Migration

For details on migration options, see RelativityOne data migration.

Analysis and Review

Searching

Executing searches can be very resource intensive. Follow these guidelines to reduce the resources used for searching.

  • Don't use the “is like” search operator on Fixed Length and Long Text fields. Using “is like” runs a resource-intensive bit-by-bit search rather than using the index.
  • Avoid using multiple layers of nesting applied in a search.
  • Don’t use wildcards in the front or in the middle of terms. Instead use the dictionary to find multiple forms of words and paste all of them into the search box.
  • Avoid searching on a number of unnecessary search terms. Instead, use the dtSearch Dictionary’s fuzzy and stemming searches identify the best words to search.
  • Adding search conditions for date ranges/folders/custodians makes a query more complex and slows down the return of search results. We recommend searching for date ranges and/or custodians, tagging that subset of documents, and then using that choice value in subsequent searches.
  • Use filters as an alternative approach to searching.

Reviewer statistics script

Reviewer Statistics is a popular script in the Relativity script library that reports on the number of documents reviewed over a certain period of time. It can take a while to complete in workspaces with large audit record tables. Instead of trying to run this report regularly during review, we suggest that you schedule it to run each night after maintenance has completed. You can then have the results emailed to one or more recipients.

Review center

Review Center Prioritized Review queues do not currently scale well above 5 million documents. For now, you can use Active Learning (possibly in conjunction with Review Center saved search queues) when your population is above 5 million documents.

For details on scaling Active Learning, see Scaling Active Learning.

For details on onverting an existing large case workspace to a repository workspace, see Review to Repository Conversion workflow.

Relativity Short Message Format

Relativity Short Message Format (RSMF) files greater than 2GB are not supported and may be unable to process. In addition, An RSMF file should have no more than 10,000 events to ensure high performance in Relativity.  An RSMF file with a significant number of participants will also cause performance issues in Relativity.

Collection considerations include limiting the volume and number of custodians per job. Limiting both will reduce the collection time and reduce the chance of errors.

Conceptual Analytics

By default, indexes are limited by the following parameters:

  • Indexes can include up to 12 million documents each.
  • Conceptual indexes can reach up to 60 GB in size.

Conceptual indexing is a resource-intensive process, and indexes with many millions of training documents can take ten hours or more to build.

  • Clustering: There are no hard limits to cluster sizes beyond what the indexes enforce. However, it should be noted setting cluster depth high (>=4) will usually cause large cluster jobs to not complete. Keeping cluster depth at 3 or lower, at least for initial cluster build, will create a greater likelihood for successful completion. In addition, the Cluster Score field should be excluded on large indexes to reduce resource contention.
  • Categorization: The number of example documents should be no more than 50,000 for categorization, and no more than 15,000 per category. Best practices dictate much lower numbers, as categorization quality is greatly improved when example documents are well curated.

Translate

The maximum number of documents that you can translate in a single job by default is 1,000.

Redact

Consider the following details related to Redact:

  • Create separate saved searches by project type (images, native spreadsheets, native PDFs).
  • To reduce run times for Automated projects, keep projects to ~10,000-15,000 documents .
  • To minimize run times, only include what you intend to redact in your saved search.
  • Use a Redactions field for the Ready to Redact and Redaction Complete choices.

Productions

Exporting data using Import/Export

Exporting large productions takes a great deal of time. Create saved searches to divide the production into roughly equal amounts, such as approximately 250,000 documents each. Create export jobs for each saved search, which will run using the built-in queue in Import/Export. Use the production images as the default.

For example, the following process exports 250,000 documents with approximately 2.5 million pages to a network share folder:

  1. Create saved searches of the production so that each has approximately 500,000 pages.
  2. Label each volume in sequential numeric order.
  3. Modify each image load file to show the top folder as the production volume.
  4. Combine each load file into one complete load file.
  5. Export each saved search of images using a single machine.
  6. Export any native files on a single machine, selecting the Beg Bates field only.
  7. After the exports, create a fixed-length text field called “Prod Native Path".
  8. Use Import/Export to overlay the exported load file from step 6 onto the Prod Native Path field.
  9. Export the text files for each record using a single machine, selecting the Beg Bates field only.
  10. After exporting the text files, create a fixed-length text field called “Prod Text Path".
  11. Use Import/Export to overlay the exported load file from step 9 onto the Prod Text Path field.
  12. Export the metadata for all the records, after loading the information for Prod Native Path and Prod Text Path.

Production scripts

Depending on the speed of your environment, this process may assist in the export process.

If you're using any of the following Relativity scripts, we recommend breaking up productions into smaller sets of less than 300,000 documents in one script batch:

  • Assign Legacy Document Bates Fields
  • Populate Parent ID and Child ID
  • Propagate Sent Date to Family Documents scripts