Automated image markup project
Creating an image markup project allows you to apply markups to a group of imaged documents automatically saving you time. You can create a project by entering words, terms, phrases, or regular expressions that you would like Redact to apply markups to. Alternatively, you can enter the rules in a .csv file and upload it to Relativity to create the rules for the image markup project. Once the project is created and run, markups will be applied automatically based on the rules you create.
Before you begin
Consider creating a saved search with the documents you wish to apply markups to and a markup set that contains the markups you wish to apply to your imaged documents before starting the image markup project creation process.
Creating an image markup project manually
To create an image markup project using rules that you manually set, do the following:
- Navigate to the Redact Projects tab.
- Click Create new project button.
- Select the Image project option.
- Complete the Create image markup project section fields. To learn more, visit Fields below.
- Complete the Rules section fields. This section is optional and if you complete these fields, they will determine how the image project applies markups. To learn more, visit Fields below. Alternatively, you can leave these fields blank and upload a .csv file. To learn more, visit the Creating an image markup project using .CSV rules below.
- Click Save.
Creating an image markup project using .csv or .xlsm rules
While a project can be created manually, if you plan on running multiple kinds of projects with similar rules and terms, you can save time by using the same copy of .csv or .xlsm rules for each project. Please note that you may still need to adjust the scope in the rules based on which project you are running.
You can upload up to 100,000 rules in a .csv file though it is worth noting that the more rules are included in a .csv or .xlsm file, the longer a project will take to run. Rules that are uploaded with a .csv file cannot be viewed in the Redact interface and instead, you will need to download a copy of the .csv file and view it outside of Relativity.
To create an image markup project using rules generated by a .csv file, do the following:
- Navigate to the Redact Projects tab.
- Click Create new project button.
- Select the Image project option.
- Enter a Project Name and select a Saved Search and Markup Set. Leave all other fields blank. For more information on these fields, visit Fields below.
- Click Save.
- Click on Upload rules csv.
- Select the desired .csv or .xlsm file from your workstation and click Open.
The file is uploaded and the rules will be created and added to the project.
Fields
The following sections and fields display while creating an image markup project:
Create image markup project section
The following fields display in this section:
- Project Name—enter the name for this new project.
- Language Code—enter the desired language code or codes if you wish to specify a non-English language for OCR. English is selected by default and you do not need to alter this option if the documents should remain in English. To learn more about how to enable non-English languages and enter codes, visit Redact Language Support.
- Saved Search—click on the drop-down menu and select the saved search that you wish to apply markups to. Optionally, you can enter a term or terms into search box at the top of the menu to help narrow results and find the desired saved search.
- Markup Set—click on the drop-down menu and select the markup set that you wish to use to apply markups.
Rules section
The Rules section is optional and these fields determine how the image project will apply markups once it is run. After completing the fields in this section, a new group of fields will appear below it so that you can create multiple rules in an image project if desired.
The following fields display in this section:
- Redaction/Highlight toggle—determines which type of markup you will be applying for this rule.
- Markup Reason—enter a description of why the markup for this rule will be applied to make it easy to track when reviewing markups using the Redact Navigation card.
- Markup Scope—determines the markup behavior when the project matches content in a document with a rule. Select one of the following options:
- Character—places the markup on the exact match even if the match is part of a word. When this option is selected, new OCR text is generated before markups are applied. This option can also be used to redact information that is not a word such as an email handle or a social security number
- Word—places the markup on the entire word that matches this rule.
- Line—places the markup on the entire line of text where a word matches this rule.
- Paragraph—places the markup between the most recent previous carriage return and the next carriage return where there is a word that matches this rule.
- Page—places a markup on the entire page where there is a word that matches this rule.
- Document—places a full-page markup on the entire document if there is a word that matches this rule.
- Markup SubType—select the style of markup you would like to place for this rule. The options available in this drop-down menu are determined by the Redaction/Highlight toggle.
- Expand markup to full width of page—select to apply a markup that spans the entire width of the page when a match is found. This option is intended for use with the Line, Paragraph, Page, or Document option in the Markup Scope drop-down menu.
- Word/Phrase—enter the words, phrases, and text that you would like to apply a markup to for this rule. Multiple words or phrases can be added to a single rule group. The words, phrases, and text you enter are case sensitive.Note: This field does not support dtSearch or wildcard syntax.
- Name—enter a name for this rule. Optionally, click on the drop-down menu to view a list of commonly used regular expressions as well as custom regular expressions users have created and select one to populate both the Name and Regex fields. These commonly used regular expressions are a starting point and not intended be all-inclusive of every variation of these patterns. Variations in document type, text quality, and pattern variability should be considered when using regular expressions.
- Regex—enter a regular expression which can be used to identify important patterns like email addresses, social security numbers, credit card numbers, and any other content that may appear in a regular pattern throughout the documents in the selected saved search. Regular expressions require a name and the expression to be valid.
Regular expressions can make false matches with image markup projects. Markup placement is dependent on the quality of HOCR produced from the source image. As a result, we recommend being defensive against common OCR mistakes. Each instance of any digit match can be replaced with a character replacement pattern that will evaluate numbers as common alphanumeric characters. To do so simply replace the \d pattern with [\dOIlZEASB].
After saving the image markup project, the regular expressions you entered can be selected by name on other Redact projects within the same workspace. To see examples of commonly used regular expression terms, see Regular expression examples.Note: If you use regex101.com to help form regular expressions to enter in Redact, ensure that you set it to /gmi, global, multi-line, case insensitive, to help you better achieve the desired results in Relativity.When using regular expressions, the scope of markups applied is based on word. When a regular expression matches the word, it uses spaces to determine where to start and stop applying the markup. So if a set of words do not have spaces between them, a markup will be applied to the entire set and not just the word that matches your entered terms.
The following table includes potential terms that can be searched using regular expressions and an example of a word or phrase that will be completely covered by a markup to illustrate this behavior:Regular Expression Terms Marked up content jane, smith, relativity jane.smith@relativity.com police, woman policewoman mother, in, law mother-in-law - DtSearch—enter a dtSearch syntax where you would like to apply markups to any matches based on the Markup Scope and Markup SubType options selected. By default, Redact will apply markups to every term in a search syntax. dtSearch uses cross-cell matching so if a rule has text that would span multiple cells, all cells that contain a match will be redacted.
Optionally, you can apply markups to part of a dtSearch syntax. For example, apple w/2 pear causes both apple and pear to receive markups. If you would like to apply a markup to one of the terms, in this case, apple, use the following syntax: (?<redact>apple) w/2 pear.
Noise words and the alphabet list are not compatible with this field. We recommend using W/N operator for proximity matches instead.Note: Combining special characters or operators may lead to inaccurate results. We recommend using Regex in these situations instead.The following syntax options are available with this field:Special characters or operators DtSearch functionality AND, OR, NOT Boolean operators ?,* Wildcards W/N (or WI) W/N operators PRE Proximity with terms order xfirstword, xlastword Built-in search words () Operator precedence "" Search words that are operators % Fuzzy searching ~ Stemming (<?redact> {term}) Partial redaction !"#$&'()*+,./:;<=>?@{|}^{|}~˜ Special characters recognized as spaces that cause word breaks # Phonic searching = Numerical patterns
Running the project
Once you have created a project, you can run it to apply markups. To learn more, visit Running and reverting a project.
Reviewing markups
After markups have been placed, it is a best practice to perform quality control on documents before they are produced. To learn more about how to do this using Redact, visit Reviewing markups to ensure accuracy. If markups were placed on OCR text, we recommend using the View hocr button to validate project rule hit results. To learn more, visit Using the Redact navigation card in the Image Viewer.