Using regular expressions with dtSearch
This page contains the following sections:
- Regular expression search strings
- Regular expression caveats in dtSearch
- Common regular expression examples
For more information about constructing regular expressions, see the following pages:
Your case team needs to find documents containing a variety of serial numbers that all match the same pattern. The pattern consists of five letters, a hyphen, then four numbers: ABCDE-1234. Instead of searching for every possible serial number, you decide to use a regular expression to find all instances of the serial number in your document set. Using regular expressions will save your case team a lot of time. You use the following regular expression search string: "##[a-z]{5}-[0-9]{4}"
Regular expression search strings
To activate regular expressions in dtSearch, use double pounds signs (##) at the beginning of your search string. You can start your search from the search bar on the List page, or by adding a condition from the search panel. For details on how to run a dtSearch, see Running a dtSearch.
Relativity breaks down the regular expression syntax as follows:
Regular Expression—"##RegularExpression". "##" signals to Relativity that the string following ##, and encapsulated by double quotes, should be interpreted as regular expression. Be sure to use straight double quotes ("") and not curly quotes (“”). Curly quotes may cause the regular expression to fail . You also want to avoid using capital letters in your regular expression because all characters in a dtSearch index are normalized to lowercase. You can use the Dictionary to help troubleshoot an individual regular expression. If your expression does not match in the Dictionary, it will not match in the index.
Note: Starting in Relativity 10.0.119.1, regular expression searches run from the Document List will highlight search hits in the Native Viewer for any returned documents. This does not apply to the Extracted Text mode of the Viewer.
Note: All regular expressions with dtSearch must begin with the ## call sign. If any table entries below do not include the call sign, be sure to add them to your search string before executing.
Regular expression metacharacters
Metacharacters are the building blocks of regular expressions. Characters in regular expression are understood to be either:
- a metacharacter with a special meaning, or
- a regular character with its literal meaning
View regular expression metacharacters examples
Metacharacter | Description | Example |
---|---|---|
\d | Whole number 0 - 9 |
\d\d\d = 327 \d\d = 81 \d = 4 \d\d\d ≠ 24631 \d\d\d does not return 24631 because 24631 contains 5 digits. \d\d\d only matches for a 3-digit string. |
\w | Alphanumeric character |
\w\w\w = dog \w\w\w\w = mule \w\w = to \w\w\w = 467 \w\w\w\w = 4673 \w\w\w ≠ boat \w\w\w does not return boat because boat contains 4 characters. \w ≠ ! \w does not return the exclamation point ! because it is a non-alphanumeric character. |
\W | Symbols |
\W = % \W = # \W\W\W = @#% \W\W\W\W ≠ dog8 \W\W\W\W does not return dog8 because d, o, g, and 8 are alphanumeric characters. |
[a-z] [0-9] | Character set, at least one of which must be a match, but no more than one unless otherwise specified. The order of the characters does not matter. |
pand[ora] = panda pand[ora] = pando pand[ora] ≠ pandora pand[ora] does not bring back pandora because it is implied in pand[ora] that only 1 character in [ora] can return. |
Note: dtSearch does not accept white space characters, even with regular expressions.
Regular expression groups
With regular expression groups you can match for groups of characters within a string. The following table provides examples of how to use groups in your regular expression. Groups are most useful when you use them in conjunction with alternation and quantifiers.
Metacharacter | Description | Example |
---|---|---|
(abc) (123) |
Character group, matches the characters abc or 123 in that exact order. |
pand(ora) = pandora pand(123) = pand123 pand(oar) ≠ pandora pand(oar) does not match for pandora because it is looking for the exact phrase pandoar. |
Escaping regular expression metacharacters
When using regular expression to search for a character that is a reserved metacharacter, use the backslash \ to escape the character so it can be recognized in its literal sense. The following table gives an example on how to escape a reserved metacharacter when searching.
Search for | Regular expression | Match results |
---|---|---|
International phone number (UK) | \+[0-9]{12} |
+447700900954 +447700900312 If the + sign is not escaped with a backslash, regular expression treats + as a quantifier instead of the literal plus sign character. |
Regular expression caveats in dtSearch
There are a few caveats to consider when using regular expressions in dtSearch. Consider the following caveats before constructing your regular expression.
- The metacharacter \s never matches a whitespace character in Relativity, because whitespace characters do not exist in a dtSearch index. Instead, spaces are word breaks in dtSearch.
Unless you modify your dtSearch index to be case-sensitive, you cannot use capital letters when constructing a regular expression in dtSearch.
For example, if you search for strings that begin with NLRT:
- NLRT-0381
- NLRT-6334
- NLRT-9167
Use the following regular expression in Relativity: "##nlrt-\d{4}"
For more information about case-sensitive indexes, see Build a Case Sensitive dtSearch Index.htm.
- You cannot search characters which are ignored during indexing, such as punctuation. To index a punctuation character, confirm that it is listed as a letter in your dtSearch alphabet file, and that it is not listed as an ignored, hyphen, or space character.
Common dtSearch regular expression examples
The following table includes examples of dtSearch regular expressions you can use to search for patterns in dtSearch.
Type |
Regular Expression |
Match Results |
---|---|---|
Bates numbers |
"##rel[0-9]{7}" "##rel\d{7}" |
REL0000331 REL3728948 |
Zip codes |
"##[a-z]{2}" "##[0-9]{5}" "##[a-z]{2}" "##\d{5}" |
IL 60606 MD 21218 ca 94115 |
United States Phone numbers |
"##[0-9]{3}-[0-9]{4}" "##\d{3}-\d{4}" Note: You must make the hyphen (-) searchable in your index. |
373-8837 463-9391 819-3814 |
United States Phone numbers with or without area codes |
"##([0-9]{3}-)?[0-9]{3}-[0-9]{4}" Note: You must make the hyphen (-) searchable in your index. |
312-483-8372 463-9391 |
Serial numbers |
"##[a-z]{4}-[0-9]{4}-[a-z]{4}-[0-9]{4}" "##[a-z]{4}-\d{4}-[a-z]{4}-\d{4}" Note: You must make the hyphen (-) searchable in your index. |
XRFD-8324-ERWF-3231 GHSR-3413-KWEJ-8173 MPFS-1357-QEGT-9376 |
Dates |
"##[0-9]{1,2}/[0-9]{1,2}/[0-9]{2,4}" |
10/17/2015 3/6/98 4/25/2006 12/04/87 95/94/93 |
Email addresses
|
"##([\w_\.]+)@([\w_\.]+)\.([\w_\.]{2,6})" Note: You must make the at symbol (@) and period (.) searchable in your index. |
Joe.Smith426@example.com 743.MaryJane@example.com Brian.23.Voltaire@example.net.uk |