Using regular expressions with dtSearch

You can use RegEx with your dtSearch index to search for things like Bates numbers, zip codes, and phone numbers. You can use RegEx in conjunction with proximity, stemming, and fuzzy searching in dtSearch.

This page contains the following sections:

For more information about constructing regular expressions, see the following pages:

Note: All regular expressions with dtSearch include the "##...." call sign to encapsulate the search text (see image below). If any tables do not include the call sign, be sure to add that to your search string before executing.

RegEx search strings

You activate RegEx in dtSearch by starting your search string with ##.

The syntax for running a RegEx search in Relativity is as follows:

"##RegularExpression"

"##" signals to Relativity that the string following ##, and encapsulated by double quotes, should be interpreted as RegEx. When adding double quotes to your RegEx, ensure you use straight quotes (""). Curly quotes (“”) cause the RegEx to fail . You also want to avoid using capital letters in your RegEx because all characters in a dtSearch index are normalized to lowercase.

You can use the Dictionary to help troubleshoot an individual regular expression. If your expression doesn't match in the Dictionary, it won't match in the index.

Note: Starting in Relativity 10.0.119.1, RegEx searches run from the Document List will highlight search hits in the Native Viewer for any returned documents. This does not apply to the Extracted Text mode of the Viewer.

RegEx metacharacters

Metacharacters are the building blocks of regular expressions. Characters in RegEx are understood to be either:

  • a metacharacter with a special meaning, or
  • a regular character with its literal meaning

View RegEx metacharacters examples

Metacharacter Description Example
\d Whole number 0 - 9

\d\d\d = 327

\d\d = 81

\d = 4

\d\d\d ≠ 24631 \d\d\d doesn't return 24631 because 24631 contains 5 digits. \d\d\d only matches for a 3-digit string.

\w Alphanumeric character

\w\w\w = dog

\w\w\w\w = mule

\w\w = to

\w\w\w = 467

\w\w\w\w = 4673

\w\w\w ≠ boat

\w\w\w doesn't return boat because boat contains 4 characters.

\w ≠ !

\w doesn't return the exclamation point ! because it is a non-alphanumeric character.

\W Symbols

\W = %

\W = #

\W\W\W = @#%

\W\W\W\W ≠ dog8

\W\W\W\W doesn't return dog8 because d, o, g, and 8 are alphanumeric characters.

[a-z] [0-9] Character set, at least one of which must be a match, but no more than one unless otherwise specified. The order of the characters doesn't matter.

pand[ora] = panda

pand[ora] = pando

pand[ora] ≠ pandora pand[ora] doesn't bring back pandora because it is implied in pand[ora] that only 1 character in [ora] can return.

Note: dtSearch does not accept white space characters, even with RegEx.

RegEx groups

With RegEx groups you can match for groups of characters within a string. The following table provides examples of how to use groups in your RegEx. Groups are most useful when you use them in conjunction with alternation and quantifiers.

Metacharacter Description Example

(abc)

(123)

Character group, matches the characters abc or 123 in that exact order.

pand(ora) = pandora

pand(123) = pand123

pand(oar) ≠ pandora pand(oar) does not match for pandora because it's looking for the exact phrase pandoar.

Escaping RegEx metacharacters

When using RegEx to search for a character that is a reserved metacharacter, use the backslash \ to escape the character so it can be recognized. The following table gives an example on how to escape a reserved metacharacter when searching.

Search for RegEx Match results
International phone number (UK) \+[0-9]{12}

+447700900954

+447700900312

If the + sign is not escaped with a backslash, RegEx treats + as a quantifier instead of the literal plus sign character.

RegEx caveats in dtSearch

There are a few caveats to consider when using RegEx in dtSearch. Consider the following caveats before constructing your RegEx.

  • The metacharacter \s never matches a whitespace character in Relativity, because whitespace characters don't exist in a dtSearch index. Instead, spaces are word breaks in dtSearch.
  • Unless you modify your dtSearch index to be case-sensitive, you cannot use capital letters when constructing a regular expression in dtSearch. Thus, if you are searching for varying strings that all begin with NLRT, such as:
    • NLRT-0381
    • NLRT-6334
    • NLRT-9167
    • The proper Relativity RegEx is: "##nlrt-\d{4}".

    For more information about case-sensitive indexes, see Build a Case Sensitive dtSearch Index.htm.

  • You can't search characters which are ignored during indexing, such as punctuation. To index a punctuation character, confirm that it is listed as a letter in your dtSearch alphabet file, and that it is not listed as an ignored, hyphen, or space character.

Common dtSearch RegEx examples

The following table includes examples of dtSearch RegEx you can use to search for patterns in dtSearch.

Note: You must make any hyphens or symbols represented in these examples searchable in your dtSearch index.

Type

Regular Expression

Match Results

Bates numbers

"##rel[0-9]{7}"

"##rel\d{7}"

 

REL0000331

REL3728948

 

Zip codes

 

 

"##[a-z]{2}" "##[0-9]{5}"

"##[a-z]{2}" "##\d{5}"

 

IL 60606

MD 21218

ca 94115

 

United States Phone numbers

 

"##[0-9]{3}-[0-9]{4}"

"##\d{3}-\d{4}"

Note: You must make the hyphen (-) searchable in your index.

373-8837

463-9391

819-3814

 

United States Phone numbers with or without area codes

 

"##([0-9]{3}-)?[0-9]{3}-[0-9]{4}"

Note: You must make the hyphen (-) searchable in your index.

312-483-8372

463-9391

Serial numbers

"##[a-z]{4}-[0-9]{4}-[a-z]{4}-[0-9]{4}"

"##[a-z]{4}-\d{4}-[a-z]{4}-\d{4}"

Note: You must make the hyphen (-) searchable in your index.

XRFD-8324-ERWF-3231

GHSR-3413-KWEJ-8173

MPFS-1357-QEGT-9376

 

Dates

 

 

"##[0-9]{2}/[0-9]{2}/[0-9]{2,4}"

 

10/17/2015

3/6/98

4/25/2006

12/04/87

95/94/93

Email addresses

 

"##([\w_\.]+)@([\w_\.]+)\.([\w_\.]{2,6})"

Note: You must make the at (@) and period (.) searchable in your index.

Joe.Smith426@example.com

743.MaryJane@example.com

Brian.23.Voltaire@example.net.uk