

Optical Character Recognition (OCR) translates images of text, such as scanned and redacted documents, into actual text characters. There are three main steps involved in OCRing documents:
With OCR you can view and search on text that's normally locked inside images. It uses pattern recognition to identify individual text characters on a page, such as letters, numbers, punctuation marks, spaces, and ends of lines.
See these related pages:
See this related recipe:
Note: Go here for information on running OCR on redacted production documents.
Your client is charged with a patent violation, and you need to present a load file with all emails and exchanges between members of their organization related to the technology in question to the opposing counsel. Some of the documents contain redacted privileged information and trade secrets.
You run OCR on your document set prior to producing documents to ensure that the text you provide excludes the redacted content. This ensures that all extracted text in the load file excludes privileged information.
An OCR profile is a saved, reusable set of parameters that you use when creating an OCR set. To run an OCR job, you must first create an OCR profile.
You don't have to create a profile for every OCR set you create. You can use only one profile for all sets. However, you may want to have multiple profiles saved with different accuracy or language settings to use for different document sets you plan to OCR.
To create an OCR profile, follow these steps:
When Arabic is selected as a recognition language in the OCR profile, English and non-English Latin-alphabet languages are also recognized by default by the OCR engine. Thus, it is not necessary to include them as additional languages in the OCR profile. In these cases, the recognition of text on images that contain Arabic and Non-English languages that also use the Latin alphabet may be less accurate. Accents and other characters not used in English may be misidentified.
Running OCR with an OCR profile that combines Arabic with other languages is not supported. This configuration may lead to OCR Image Errors. As a result, you may not be able to recognize any text from the image, regardless of whether there was an actual recognizable text in the image.
Complete the following OCR profile fields:
Note: The auto-rotate images function requires the preprocess images option to be selected for it to take effect, even if the auto-rotate setting is set to "true." The OCR engine will only rotate the image if both preprocess and auto-rotate options are enabled.
Note: If the saved search or production you use contains multiple languages and you only select one language from the list, the OCR uses the individual characters of the selected language to OCR all the text.
When Arabic is selected as a recognition language in the OCR profile, English and non-English Latin-alphabet languages are also recognized by default by the OCR engine. Thus, it is not necessary to include them as additional languages in the OCR profile. In these cases, the recognition of text on images that contain Arabic and Non-English languages that also use the Latin alphabet may be less accurate. Accents and other characters not used in English may be misidentified.
Running OCR with an OCR profile that combines Arabic with other languages is not supported. This configuration may lead to OCR Image Errors. As a result, you may not be able to recognize any text from the image, regardless of whether there was an actual recognizable text in the image.
Note: If an OCR job is not extracting text from redactions as expected and is instead displaying a blank space where text should be, change the Accuracy field from High to Low. In addition, if there are no images in the set that need rotating, turn off the Auto-Rotate Images option. Making these adjustments should improve data extraction.
If you'd like to further distinguish the profile, click the Other tab and enter information in the Keywords and/or Notes fields.
Using the OCR Sets tab you can submit groups of documents defined by a saved search or production to be OCRed based on the settings defined by the OCR profile. Relativity writes the results to the destination field that you specify.
To create an OCR set, you can copy an existing OCR set. If you copy an OCR set, every current setting in that set copies over.
Before you create an OCR set, you first need to create an OCR profile. See Creating and editing an OCR profile. To create an OCR set, follow these steps:
View OCR set fields
OCR Set Information
OCR status
The following fields are read-only:
Document Completion - view the count of documents completed in the OCR set, the number of documents with errors, and the number of documents left to have text assembled. Once documents are completed, it is possible to view their OCR'd text. Any errors appear in red.
OCR settings
Click to bring up the OCR Profile Picker on OCR Set view, which lists Profiles that have already been created in the OCR Profiles tab.
Click to bring up the Field Picker on OCR Set view, which lists all document long text fields you have access to. If you selected non-Western European languages in your OCR Profile, the destination field should be Unicode-enabled. This field is overwritten each time a document is OCRed with that destination field selected.
OCR Document Set
Choosing a saved search only OCRs the original image. If a document on the selected saved search does not have an associated image, that document won't be OCRed. Likewise, the OCR engine will not account for redactions added to the image unless there are redactions on the image itself.
Click to open the Production Picker on OCR Set view, which displays all production sets with a status of Produced that you have access to. The engine OCRs all burned-in redactions, branding, headers and footers, and text. All documents with images in the production are OCRed, not only those with redactions. This includes placeholders.
Note: The OCR engine does not support OCRing content that represents handwritten text.
The following permissions are required to Running an OCR set.
Object | Permission |
---|---|
OCR Set | View |
OCR Profile | View |
Production | View |
Saved Search | View |
OCR Set Tab | View |
Note: These permissions are strictly for running the Set, i.e. if the OCR Set is already set up. A user needs the Add permissions on OCR Set and OCR Profile to create a new one.
When you save an OCR set, the OCR Set console appears that you use to run the OCR job.
The OCR Set console provides the following action buttons:
Note: Only existing images are OCRed when you click OCR Documents. Images that are currently being loaded will NOT be OCRed if those images are added after you click OCR Documents. Changes made to an OCR profile that's referenced by an OCR set aren't reflected until you click OCR Documents on that set.
Retry Errors - attempts to re-run a job with errors.
Note: If an OCR set is in a status of Completed with errors and the data source is modified (i.e. the OCR set is updated from a Saved Search to a Production Set), the OCR set will reset to a status of Ready to run and cannot be retried.
Retry is only possible when an OCR set is in Completed with errors. Retrying an OCR set attempts to run those images or documents in the OCR set that previously resulted in errors. Only errored images or documents are processed when the system tries to resolve errors.
Refresh Page - updates the Status, Image Completion, and Document Completion fields while the set is running. Clicking this button reloads the page and may reflect different values in those fields depending on what happened during the OCR job.
Note: During the In progress state, while the OCR Job Worker Agents are OCR'ing text, the OCR Job Manager Agent will find documents that have all of their images complete, and will export the text to the selected Destination Text field for that document. This means that OCR'ed text in the Destination Field of the Document will be updated as the OCR Set is progressing, and not at the end of the job.
Once the OCR job completes, the Document (OCR Results) section of the OCR Set Layout form displays all documents successfully OCRed. The fields in this view are Control Number and File Icon.
In addition, it's possible to see Image OCR Errors or Document OCR Text Import Errors in the same tabbed display as Document (OCR Results).
These errors can be exported.
Note: Only the first 1000 image errors and first 1000 document errors are shown. These errors cannot be filtered.
Image OCR Errors shows:
Document OCR Text Import Errors shows
Select the Field where you stored the OCR text. If a document was multiple fields with OCRed or extracted text, you will be presented with a drop down menu where you can select the field.
Why was this not helpful?
Check one that applies.
Thank you for your feedback.
Want to tell us more?
Great!