OCR

Optical Character Recognition (OCR) translates images of text, such as scanned and redacted documents, into actual text characters. There are three main steps involved in OCRing documents:

  1. Defining a production or saved search that contains the documents you want to OCR. SeeCreating or editing a saved search or Production sets.

    Notes: All documents, including native files, must be imaged before using OCR.


  2. Creating an OCR profile. See Creating and editing an OCR profile.
  3. Creating an OCR set that references your OCR profile. See Creating and editing an OCR set.

With OCR you can view and search on text that's normally locked inside images. It uses pattern recognition to identify individual text characters on a page, such as letters, numbers, punctuation marks, spaces, and ends of lines.

Note: RelativityOne scales automatically for OCR.

See these related pages:

See this related recipe:

  • OCR redacted production documents and export text

Note: See OCR on redacted production documents for information on running OCR on redacted production documents.

Creating and editing an OCR profile

An OCR profile is a saved, reusable set of parameters that you use when creating an OCR set. To run an OCR job, you must first create an OCR profile.

You don't have to create a profile for every OCR set you create. You can use only one profile for all sets. However, you may want to have multiple profiles saved with different accuracy or language settings to use for different document sets you plan to OCR.

To create an OCR profile, follow these steps:

  1. Click the OCR Profiles tab under the OCR tab.
  2. Click New OCR Profile.
  3. Complete the fields on the form. See OCR profile fields.
  4. Click Save.
    Notes: Special considerations when using Arabic as a recognition language:

    • When Arabic is selected as a recognition language in the OCR profile, English is also recognized by default by the OCR engine. Thus, it is not necessary to select English as an additional recognition language in the OCR profile When Arabic is selected.
    • Running OCR with an OCR profile that combines Arabic with other languages is not supported. This configuration may lead to OCR Image Errors. As a result, you may not be able to recognize any text from the image, regardless of whether there was an actual recognizable text in the image.

OCR profile fields

Creating and editing an OCR set

Using the OCR Sets tab you can submit groups of documents defined by a saved search or production to be OCRed based on the settings defined by the OCR profile. Relativity writes the results to the destination field that you specify.

To create an OCR set, you can copy an existing OCR set. If you copy an OCR set, every current setting in that set copies over.

Before you create an OCR set, you first need to create an OCR profile. See Creating and editing an OCR profile. To create an OCR set, follow these steps:

  1. Click the OCR Sets tab under the OCR tab.
  2. Click New OCR Set. If you want to edit an existing OCR set, click the Edit link next to the OCR set name.
  3. Complete the fields on the form. See OCR set fields.
  4. Click Save.

    The OCR Set Console appears. See Running an OCR set.

OCR set fields

Running an OCR set

The following permissions are required to Running an OCR set.

Object Permission
OCR Set View
OCR Profile View
Production View
Saved Search View
OCR Set Tab View

Note: These permissions are strictly for running the Set, i.e. if the OCR Set is already set up. A user needs the Add permissions on OCR Set and OCR Profile to create a new one.

Note: As of February 2025, the new Feature Permissions redefines Relativity's security management by shifting the focus from Object Types and Tab Visibility to feature-based permissions. This new method is simply another option; any feature-specific permissions information already in this topic is still applicable. This new interface enables administrators to manage permissions at the feature level, offering a more intuitive experience. By viewing granular permissions associated with each feature, administrators can ensure comprehensive control, ultimately reducing complexity and minimizing errors. For details see Instance-level permissions and Workspace-level permissions.

When you save an OCR set, the OCR Set console appears that you use to run the OCR job.

An image of the OCR Set Console

The OCR Set console provides the following action buttons:

  • OCR Documents - starts the OCR job. This processes all images in the selected data source or production.

    If a user stops the job, it completes with errors, or it fails. Click OCR Documents to start the job again. If there are documents in the Document (OCR Results) section of the OCR Set Layout form, these aren't immediately cleared when the OCR Documents button is clicked on the console. These are only cleared when the job goes into processing, which is reflected in the Status when you click the Refresh Page link.
    Note: Only existing images are OCRed when you click OCR Documents. Images that are currently being loaded will NOT be OCRed if those images are added after you click OCR Documents. Changes made to an OCR profile that's referenced by an OCR set aren't reflected until you click OCR Documents on that set.
  • Stop OCR - terminates the running OCR job. This button enables after you click OCR Documents. When you stop a job, the text that was already OCRed is not saved, and you can't resume the job from the point it stopped. You have to click OCR Documents to begin the job over again.
  • Retry Errors - attempts to re-run a job with errors.
  • Note: If an OCR set is in a status of Completed with errors and the data source is modified (i.e. the OCR set is updated from a Saved Search to a Production Set), the OCR set will reset to a status of Ready to run and cannot be retried.
    Retry is only possible when an OCR set is in Completed with errors. Retrying an OCR set attempts to run those images or documents in the OCR set that previously resulted in errors. Only errored images or documents are processed when the system tries to resolve errors.
  • Refresh Page - updates the Status, Image Completion, and Document Completion fields while the set is running. Clicking this button reloads the page and may reflect different values in those fields depending on what happened during the OCR job.
Note: During the In progress state, while the OCR Job Worker Agents are OCR'ing text, the OCR Job Manager Agent will find documents that have all of their images complete, and will export the text to the selected Destination Text field for that document. This means that OCR'ed text in the Destination Field of the Document will be updated as the OCR Set is progressing, and not at the end of the job.

Once the OCR job completes, the Document (OCR Results) section of the OCR Set Layout form displays all documents successfully OCRed. The fields in this view are Control Number and File Icon.

In addition, it's possible to see Image OCR Errors or Document OCR Text Import Errors in the same tabbed display as Document (OCR Results).

These errors can be exported.

Note: Only the first 1000 image errors and first 1000 document errors are shown. These errors cannot be filtered.

An image of the Image OCR Errors tab

Image OCR Errors shows:

  • Document Artifact ID
  • Document Identifier
  • Page Number
  • Message

Document OCR Text Import Errors shows

  • Document Artifact ID
  • Document Identifier
  • Message

Viewing OCR text

Once you run the OCR set, you can review your OCRed text. The most effective way of viewing your OCR text is by following these steps:

  1. To launch the Review Interface, click the control number of a document.
  2. Select the Field where you stored the OCR text. If a document was multiple fields with OCRed or extracted text, you will be presented with a drop down menu where you can select the field.
    An image of OCRed text in the viewer