dtSearch includes special characters and other operators that you can use to define search criteria. The following table summarizes the syntax options available for queries run against a dtSearch index.
|Special characters or operators||Search functionality|
|AND, OR, NOT||Boolean operators|
|W/N (or WI)||W/N operator|
|PRE||Proximity with terms order|
|xfirstword, xlastword||Built-in search words|
|" "||Search words that are operators|
|Noise Words, Alphabet||Noise words and the alphabet file|
|date(), mail(), creditcard()||Auto-recognition of dates, emails, credit cards|
For the list of the special characters recognized as spaces that cause word breaks, see Noise words and the alphabet file.
- The underscore (_) is not recognized as a space by default. Verify that a given character is defined as causing a word break before using it as a space in a dtSearch.
- Relativity does not use the colon (:) or ampersand (&), even though they are
considered a syntax term by the dtSearch index. If you want to use these symbols, the following applies:
- To include either symbol in the dtSearch index, you must add it to the alphabet
For more information, see Searching for symbols.
- To search for either symbol within file content, you must use a regular expression to define the character. Searching for special characters on their own will produce incorrect results.
- To include either symbol in the dtSearch index, you must add it to the alphabet file.
- dtSearch indexes are case insensitive by default. All characters in a dtSearch index are normalized to lowercase. For example, if your exact phrase search is an acronym like ACT, you must build a case-sensitive dtSearch index.
- For more information on making stop words and alphabet lists searchable, see Making the dtSearch stop word and alphabet list searchable.
Exact phrase - no double quotes
Searching for words right next to each other with no operator between them constitutes an exact phrase in dtSearch. For example, if you search for apple pear, dtSearch returns documents that contain the exact phrase apple pear. There is no rule that requires double quotes around a phrase of any number of words. You only need to use double quotes when searching for a word that is a dtSearch operator (see the next section).
Search string: pear orange
- Returns the exact phrase: pear orange
- Does not return standalone word: pear
- Does not return standalone word: orange
Search string: apple grape banana
Returns the exact phrase: apple grape banana
Does not return partial phrase: apple grape
Does not return standalone word: grape banana
You must use double quotes when searching for exact phrases that contain words that are reserved as dtSearch operators, such as the Boolean connectors AND, OR. Please take a look at the following examples:
Note: Connector words such as and and not are also included by default in the noise word list. All these words are noise words and you must remove these words from the list to make dtSearch index these files
Search string: clear and present danger
- Returns documents that contain both the word clear and the phrase present danger.
- If you need to return documents that contain the exact phrase clear and present danger, you must:
- Remove the word and from the dtSearch noise words list.
- Surround the search string with "double quotes" so that the word AND is not treated as a Boolean connector.
Search string: "clear and present danger"
- Returns the exact phrase clear and present danger.
Note: Do not confuse the parentheses function for order of preference with the double quotes function.
Auto-recognition provides you with the ability to search for various date formats, e-mail addresses, and credit card numbers. However, it can dramatically affect indexing and searching performance. You must activate auto-recognition before you can use it in your workspace. Contact your system admin for more information.
Date recognition searches for strings that appear to be dates. It uses English-language months, including common abbreviations, and numerical formats. For example, these date formats are recognized:
- January 15, 2006
- The fifteenth of January, two thousand six
Note: the short month format (ex. "Jan", "Feb", etc) can be problematic, and is occasionally rejected by Relativity. The recommendation is to stick with the full name of the month to avoid any errors (ex. "January", "February", and so forth).
Note the following date and date range search strings:
- To search for a date, enter a date expression between the parentheses in the string date(); for example, date(january 10 2006)
- To search for range of dates, enter a date range between the parentheses in the string date(); for example, date(january 10 2006 to january 20 2006)
- To search for a range of dates near the word apple, enter date(january 10 2006 to january 20 2006) w/10 apple
- Unterminated date ranges aren't supported. To search for any date after or before a particular date, enter a bounded range with a maximal or minimal value for the bounds. The maximum value for a year is 2900, and the minimum value is 1000. For example, date(january 10 2006 to january 1 2900)
dtSearch recognizes numeric strings as dates, as long as it can be interpreted as a valid date. This includes formats common in the US and UK, including:
- MM/DD/YY or MM-DD-YY
- MM/DD/YYYY or MM-DD-YYYY
- DD/MM/YY or DD-MM-YY
- DD/MM/YYYY or DD-MM-YYYY
In the case of ambiguous dates, such as 01/05/10, dtSearch defaults to MM/DD/YY. If the date contains words dtSearch converts the words to a numeric value to help interpret the date. For example, 30 must be a day and not a month, and 2015 must be a year (not a day or month).
Email address recognition
Email address recognition searches for text with the syntax of a valid email address, such as email@example.com. With this feature, you can search for a specific email address regardless of the alphabet settings for "@", ".", or other punctuation in the email address.
You can also use the word listing functions in dtSearch to enumerate all email addresses in a document collection. You must include either the * or ? wildcard expression to enumerate all email addresses in a document collection.
- mail(firstname.lastname@example.org) - returns the exact email address: email@example.com
- mail(firstname.lastname@example.org) - returns variations of the email address: email@example.com; firstname.lastname@example.org
Credit card number recognition
Credit card number recognition searches for any sequence of numbers that matches the syntax for a valid credit card number issued by a major company, such as Visa, MasterCard, and so on. A credit card number is recognized regardless of the pattern of spaces or punctuation embedded in the number:
- 1234 5678 1234 5678
Credit card issuers use numerical tests to exclude sequences of numbers that aren't valid credit card numbers. Since these tests don't detect all invalid numbers, the feature for credit card number recognition may find additional invalid numbers.
To search for a credit card number, enter a credit card number between the parentheses in creditcard() as exemplified in creditcard(1234*).
The dtSearch engine supports Boolean operators, including AND, OR, and NOT. You can use these operators to connect multiple phrases or terms in a single search expression.
Note: For details on how proximity and boolean strings are parsed in search conditions, see this knowledge base article on the Relativity Community.
When you use the AND operator to connect expressions, only documents that contain all the expressions in the search string return in the result set. The following search strings illustrate how to use this operator:
- apple pie AND poached pear retrieves any documents that contain both phrases.
- (apple or banana) AND (pear w/5 grape) retrieves any documents that contain apple or banana AND contain pear within five words of grape.
You can combine a search for required search terms with other terms that are optional. The words before the AndAny connector are required search terms, and the words after the AndAny connector are optional. A document only returns if it contains at least the required search terms. For example, (apple and pear) AndAny (grape or banana) would find any document that contains apple and pear, with grape and banana also being counted as hits only if apple and pear are also present in the document.
The following example further explains the AndAny operator:
You have three documents, each containing the term(s) specified below:
- Document 1: Apple
- Document 2: Apple, Grape, Pear
- Document 3: Grape, Pear
Note the following behavior:
- When you search for the term apple, documents 1 and 2 return.
- When you search for the string apple AND pear, only document 2 returns.
- When you search for the string apple AndAny pear, documents 1 and 2 return.
When you use the OR operator to connect expressions in a search string, documents that contain one or more of these expressions return in the result set. For example, the search string apple pie or poached pear returns documents that contain apple pie, poached pear, or both phrases.
In a dtSearch, you can use the NOT operator at the beginning of a search expression to negate its meaning and exclude documents from a result set. For example, the search expression applesauce and NOT pear returns documents that contain the word applesauce, but not those documents that contain both the words applesauce and pear.
- NOT operator as a standalone - You can use the NOT operator by itself at the beginning of a search expression. For example, the search expression NOT pear returns all the documents that do not contain the word pear. The search expression NOT (apple w/5 pear) returns all the documents that do not contain the word apple within five words of pear. Other examples:
- NOT (apple or pear) - this search expression returns every document that does not have apple or pear in it.
- NOT (apple and pear) - this search expression returns documents where apple and pear don't appear together in the same document. It will return all other documents including documents with the word apple and documents with the word pear, but not documents that include both terms.
- NOT operator as a connector - When the NOT operator appears in the middle of a search expression, it must be used in conjunction with either AND or OR. For example, the search expression apple OR NOT pear returns all the documents that contain the word apple and those that do not contain the word pear.
Note: You can also use NOT in a proximity search as illustrated by the NOT W/N (NOT Within N words) operator.
See W/N operator.
- AND NOT operator - You can use the AND NOT operator to develop queries for documents that include the first expression but not the second expression. For example, you may want to query for email messages that have Ryan as the author, but do not have Will as the recipient. The following record illustrates these conditions:
Document OCR Recipient Author AS00001 From: Ryan To: Will Will Ryan
You can perform a dtSearch using the search string Ryan AND NOT Will and return results that don't include document AS00001.
The dtSearch engine combines into a single pool the text for all fields identified for inclusion in an index. A search string using the AND NOT operator queries the index that includes the combine text from all indexed fields, rather than querying the content of individual fields. This behavior ensures consistent result sets when querying with the AND NOT operator.
Note: A keyword search is an SQL full text search, which queries individual fields. Keyword search won't return the same results as dtSearch when the NOT operator is used to query across multiple fields. See NOT operator.
The precedence, or order of evaluation, determines how a group of expressions is evaluated in a query.
Note: By default, dtSearch evaluates OR expressions before AND expressions: A AND (B OR C). Unlike dtSearch, the order of precedence for a keyword search evaluates AND expressions before OR expressions: (A AND B) OR C. See Keyword search.
Evaluation order for the search string: apple AND pear OR grape
- pear OR grape is evaluated first
- AND apple is evaluated second
Documents containing the following terms are returned:
- pear, grape, apple
- pear, apple
- grape, apple
Operator precedence - with parentheses
Parentheses allow you to group expressions and control the order of query string execution where the query string contains both AND and OR operators. dtSearch requires both AND and OR operators for the parentheses to affect query results and ignores parentheses when the query string does not contain both operators.
For query strings containing both AND and OR operators, dtSearch evaluates OR first before AND. However, expressions contained within parentheses take precedence. If you want AND evaluated before OR, place the AND expression within parentheses.
Evaluation order for the search string: grape OR (apple AND pear)
- apple AND pear - evaluated first as they reside within the parentheses
- OR grape - evaluated second
dtSearch returns documents containing the following terms:
- apple, pear, grape
- apple, pear
Workaround for expressions containing only AND or OR operators
Use a proximity operator to separate query expressions. For example, insert a PRE proximity operator between each expression of the search string.
Evaluation of the search phrase: (grape OR apple) PRE/1 (banana OR pear)
dtSearch returns documents containing the following terms:
- grape banana
- grape pear
- apple banana
- apple pear
Evaluation of the search phrase: (grape OR apple) (banana OR pear)
dtSearch ignores the parentheses and analyzes the query as grape OR apple banana OR pear and returns documents with the following terms:
- apple banana
- grape, apple banana, pear
dtSearch includes the following built-in search words:
You can use these terms to limit a search to the beginning or end of a file. For example, apple W/10 xlastword searches for apple within 11 words of the end of a document.
- xfirstword - Marks the beginning of a file.
- xlastword - Marks the end of a file.
Using the dtSearch engine, you can perform fuzzy searches, which return documents containing spelling variations of a specified term. You may want to use fuzzy searching when querying documents that contain misspelled terms, typographical errors, or have been scanned with Optical Character Recognition (OCR).
The percent sign (%) is the character used for fuzzy searches. The number of (%) used indicates how many characters in the search term dtSearch engine ignores when it runs the query. The position of the % indicates the number of characters from the beginning of the term that must match exactly with words in the result set. The following search strings illustrate how this character is used:
- app%ly indicates that a matching word must begin with app and differ from apply by only one character.
- a%%pply indicates that a matching word must begin with a and differ from apply by only two characters.
Using the fuzziness operator and fuzziness level option
In Relativity, you can use the fuzziness character (%) or the Fuzziness Level menu to perform fuzzy searches. The availability of these search options depends on the location where you are running a dtSearch:
- Documents tab - when you select a dtSearch in the Search With option, you can use the fuzziness character (%). See Running a dtSearch.
- Dictionary Search - when you click the Dictionary link, you can use the fuzziness character (%) and the Fuzziness Level menu on the Dictionary Search dialog. See Running a Dictionary search.
In the Fuzziness Level menu, you can select a value from 1 to 10, which applies to all terms in the textbox. Larger numbers return terms with more variation. We recommend using values between 1 and 3 for moderate error tolerance. The following table describes the expected results for sample settings.
Fuzziness level Description of search results Blank Only returns the entered term. 1 Returns slight variations of the entered term. 4 Returns multiple variations of the entered term.
- Saved Search - when you create a saved search, you can use the fuzziness operator (%) and the Fuzziness Level menu when you add a dtSearch index condition or by clicking the Dictionary link. The Fuzziness Level menu in a saved search uses the same settings as described above. See Saved search.
Note: The Fuzziness Level menu is independent of the fuzziness (%) character that you can enter in the textbox. A search for appl% without a Fuzziness Level setting may return documents containing apple or apply, since these terms have the stem appl and differ by one character.
Fuzzy searching uses term length and fuzziness level to decide how many % characters to add. This is not a straight level to character match. This means a level 7 fuzziness search doesn't necessarily mean up to 7 additional characters return.
The dtSearch engine references a default list of noise words and an alphabet file when it creates a new index. The noise words are excluded in a dtSearch index to improve query performance and prevent unnecessary index growth. Commonly used words such as AND, THE, WILL are ignored when you run a query. The alphabet file determines how characters and spaces are handled in a query.
Note: If your dtSearches aren't returning the expected results, you may want to ask your system admin about adjusting the noise word list or alphabet file.
The dtSearch engine uses an alphabet file to define which characters are treated as text, cause word breaks, and are ignored. System admins can modify the default alphabet file when they create or edit a dtSearch index. See Making a special character searchable.
The alphabet file determines which characters are treated as text, which cause spaces, which cause word breaks, and which are ignored. The categories of items in the alphabet file include:
- Letters - all searchable characters, which should include all alphabet characters (a-z and A-Z) and all digits (0-9).
- Hyphens - characters removed during index creation. For example "First-Level" becomes two separate words in a dtSearch index.
- Spaces - characters that cause a word break. For example, the period is indexed as a space character by default. Thus, dtSearch processes U.S.A. as three separate words: U, S, and A. Values listed as \## are Unicode characters. Their definitions are:
\09 - horizontal tab
\0a - line feed
\0c - form feed
\0d - carriage return
\5c - backslash (\)
Note: Do not remove these Unicode characters from your alphabet file.
- Ignore - characters that are disregarded in processing text. For example, if you classify the period as ignore instead of space, then dtSearch would process U.S.A. as one word, USA.
Note: The underscore (_) isn't recognized as a space by default. Verify that a given character is defined in the [Spaces] section before treating it as a word break in a dtSearch.
Default noise word list
The dtSearch engine is configured with the default noise words listed in the following table. System admins can modify this list when they create or edit a dtSearch index. Thus, if you are searching for a phrase that contains a term in the noise words list, you will need to remove the term from the list and rebuild your index.
|Begins with...||Noise words|
|A||a, about, after, all, also, an, another, any, are, as, and, at|
|B||be, because, been, before, being, between, but, both, by|
|C||came, can, come, could|
|F||for, from, further, furthermore|
|H||has, had, he, have, her, here, him, himself, his, how, hi, however|
|I||i, if, in, into, is, it, its, indeed|
|M||made, many, me, might, more, moreover, most, much, must, my|
|N||never, not, now|
|O||of, on, only, other, our, out, or, over|
|S||said, same, see, should, since, she, some, still, such|
|T||take, than, that, the, their, them, then, there, these, therefore, they, this, those, through, to, too, thus|
|W||was, way, we, well, were, what, when, where, which, while, who, will, with, would|
Note: You can make special characters searchable in a dtSearch index. However, some characters need to be escaped using regular expressions. For more information, see Searching for symbols.
- Navigate to the dtSearch index.
- Click Edit, and then scroll down to the Alphabet section.
- Delete the character from the current category ("hyphen", "spaces", etc). Don't delete the category heading.
- Enter the character you want to make searchable four times, separated by spaces under the section [Letters] // Original letter, lower case, upper case, unaccented.
Note: You must also begin with a space.
- Perform a full build on the dtSearch index. The characters you added are included in your searches.
Note: If you make any symbol a searchable character in your dtSearch index and then build an index on a long, uninterrupted search string, such as a file path, dtSearch truncates the string after the 32nd character. For more information, see Searching for words longer than 32 characters.
Using the dtSearch engine, you can perform phonic searching, which returns documents containing words that sound like the word you're searching for and begins with the same letter. The pound sign (#) is the character used for phonic searches when added to the front of a word. For example, a phonic search for pear also finds pair and pare.
You can also use phonic searching in Dictionary searches.
Using the dtSearch engine, you can perform stemming searches, which return documents containing grammatical variations of a root word. Stemming is limited to English only. The tilde (~) is the character used for stemming searches when added at the end of the root word. For example, a search on apply~ returns documents containing the words apply, applying, applies, and applied. After you perform a stemming search,you can enter applied in the Find Next box, and then click the Find Next icon to locate hits or grammatical variations.
Because stemming only works with the root word, it generally doesn't return irregular variations of a verb. For example, a search on run~ would not return ran. The dtSearch engine only supports stemming for the English language.
Using the stemming operator and enable stemming checkbox
In Relativity, you can use the stemming character (~) or the Enable Stemming checkbox to perform stemming searches. The availability of these search options depends where you're running a dtSearch:
- Documents tab - When you select a dtSearch in the Search With option, you can use the stemming character (~). See Running a dtSearch.
- Dictionary Search - When you click the Dictionary link, you can use the stemming character (~) and the Enable Stemming checkbox on the Dictionary Search dialog. See Running a Dictionary search.
- Saved Search - When you create a saved search, you can use the stemming character (~) and the Enable Stemming checkbox in the Search Conditions section of the form. See Saved search.
The Enable Stemming checkbox is independent of the stemming (~) character that you can enter in the Search Terms box or Dictionary Search textbox. A search for apply~ with Enable Stemming checkbox unselected returns apply, applied, applies, or applying. A search for apply with Enable Stemming checkbox selected returns the same results.
With fuzzy searching and stemming enabled, it checks for a fuzzy match twice, once against the original term, and once comparing the stemmed word with the stemmed word in the index. A match on either counts as a hit.
The dtSearch engine supports special characters that you can use as wildcards. It also supports the use of leading wildcards, or those added to the beginning of a word. The following characters represent wildcards in dtSearches:
|?||Matches any single character.|
|*||Matches any number of characters. Note: This character slows searches when used near the beginning or middle of a word.|
|~||Matches words containing grammatical variations of a root word. The tilde (~) is the stemming character available in dtSearches. See Stemming.|
|=||Matches any numerical character (ex. === == ==== for Social Security Numbers). See Numerical patterns|
As illustrated in the following table, you can add wildcards to the root of any word to return matching terms from a dtSearch.
|Sample search string||Description of search results|
|appl*||Matches apple, application, and so on.|
|*cipl*||Matches principle, participle, and so on.|
|appl?||Matches apply and apple, but not apples.|
|ap*ed||Matches applied, approved, and so on.|
|apply~||Matches apply, applied, applies, and so on.|
|=th||Matches 4th, 5th, 6th, 7th, 8th, and so on.|
You can use the W/N (within N words) operator to return documents with two words or phrases occur within a certain proximity of each other. When using Boolean operators in a proximity search with the W/N operator, noise words are included. The N value represents the number of intervening words. For example, the search expression apple W/5 pear returns documents that contain apple only when it occurs within five words of pear. The documents returned by the search must contain the terms within the required proximity, such as five words.
The W/N operator is symmetrical. The search expression apple W/5 pear returns the exact same document as pear W/5 apple.
Note: Single characters are treated as full words when this operator is used. For instance, if you search for Harry W/2 Truman, your search retrieves documents that include Harry S Truman or Harry S. Truman.
Note: The W/N operator can be interchanged with WI (or wi). For example, the search expression apple W/5 pear returns the same results as apple WI5 pear.
You can use the NOT W/N (not within N words) operator to exclude documents from a result set when two words or phrases are within a certain proximity of each other.
For example, the search expression apple NOT W/20 pear returns documents that contain apple when it's separated from pear by at least 20 words; it also returns documents that don't contain pear. Documents that contain apple separated from pear multiple times with varying proximity can be returned as long as there is at least one concurrence where apple is separated from pear by at least 20 words.
The NOT W/N isn't symmetrical. The search expression apple NOT W/20 pear doesn't return the same documents as pear NOT W/20 apple.
You can create complex expressions with the W/N operator by connecting words or phrases. At least one of these expressions must be a single word, phrase, or group of words and phrases connected by an OR operator as illustrated by the following:
- (apple AND banana) W/10 (pear OR grape)
- (apple AND banana) W/10 (orange tree)
Note: Complex expressions with “OR” connectors can be broken up into separate searches. Search apple w/10 "orange tree" OR banana w/10 "orange tree" to return the same results as (apple OR banana) W/10 "orange tree".
Avoid creating complex expressions that produce ambiguous results as illustrated in the following examples:
- (apple AND banana) W/10 (pear AND grape)
- (apple w/10 banana) w/10 (pear and grape)
Note: dtSearch displays a warning message when you enter an ambiguous search request.
You can also use the boolean operators AND and OR to connect proximity expressions as illustrated in the following examples:
- (apple w/10 banana) AND (pear w/5 grape)
- (apple or banana) OR (pear w/5 grape)
Note: When connecting proximity expressions using boolean operators, you must use parentheses.
You can use the PRE operator to search for a word that appears within a certain number of words before another word.
For example, the search string apple PRE/5 pear returns documents where apple appears within 5 words before pear.
Note: Relativity does not use the POST operator. However, you can mimic this functionality by reversing the order of the terms, and using the PRE operator.
To search for other numerical patterns such as social security numbers, you can use the = wildcard, which matches any single digit. For example, if hyphens are indexed as spaces, then the following search request would find U.S. social security numbers:
=== == ====
This searching pattern can return false hits; for example, no valid social security number begins with 9. However, this is the only way to get social security numbers with spaces instead of dashes.
Note: dtSearch support notes the === == ==== notation is actually more performant than a regular expression for the same pattern assuming you're ok with getting some false hits.
The dtSearch connector words are:
To search for a phrase that contains one of the dtSearch connector words, quote a connector word or the phrase it is in, or put a tilde after the connector. The following search strings work in returning phrases that contain connector words:
- "clear and convincing evidence"
- not~ relevant
- "whether or not John wants to"
Note the following:
- Adding a ~ after a connector word prevents dtSearch from recognizing the word as a connector but does not otherwise affect the search. The ~ character after a word tells dtSearch to apply the stemming rules to it. Because the stemming rules included with dtSearch do not modify short words, the ~ does not change the outcome of a search for and, or, not, or to.
- Connector words such as "and" and "not" are also included by default in the noise word list. All these words are Noise words and you must remove these words from the list to make dtSearch index these files.
See Creating a dtSearch index for details.
Words and phrases
With a dtSearch, you can use double quotes to search for a phrase. For example, the phrase fruit salad is included in the search string apple w/5 "fruit salad". The following list outlines how dtSearch queries on words or phrases with noise words or punctuation:
- Phrases with Noise Words - dtSearch skips any noise words in a phrase. For example, it skips of in the search string Statue of Liberty and retrieves any documents that contains statue an intervening word, and liberty.
- Words with Punctuation - Punctuation inside a word is treated as a space. For example, dtSearch treats the search term can't as two words, can and t.
- Numbers and Characters in Parenthesis - Unexpected results may be returned when numbers or characters in parenthesis are used in a dtSearch. For example, the search term 1843 (c)(8)(ii) is treated as four words.