dtSearch Updating the Stop Word and Alphabet List to make them searchable

Relativity ignores words that don't act as meaningful criteria when you create dtSearch and keyword queries. Ignored words are known as stop or noise words. Search indexes automatically include the default list of stop words. However, you can edit this list in the dtSearch list to suit your needs. This recipe includes an overview of stop words and steps to create custom lists.

Requirements

  • Relativity 7.2 or above
  • Workspace access
  • Search index – Edit/Add and corresponding tab

Directions

Relativity references the default list of stop words each time you create a new index. System admins can't edit stop words in keyword searches. The default stop word list consists of punctuation marks, single letters and numbers, and the following words:

Stop words list

Note: Relativity ignores stop words. However, Relativity doesn't ignore their position in the search phrase set. So, if you execute the query apple w/6 pear, the search returns the phrase apple tree is far from the pear even though it contains the stop words is, from and the.

dtSearches and stop words

The default list of stop words is the same in a dtSearch as in a keyword search. The primary difference is that you can customize the dtSearch index list. For example, if the word never is important to your litigation, remove it from the stop words list, so that your search results always return that word.

To create a custom stop word list, perform the following:

  1. Create a new dtSearch index, and then name it "dtSearch - updated stop words."
  2. Select your extracted text search for the Searchable set.
  3. Delete the word "never" from the Noise Words list.
  4. Remove word from Noise Words list

  5. Save the list, and then perform a full build on your new index.

Stop words in languages other than English

You can set up stop words to search documents in other languages. If the workspace primarily contains of documents in a different language, see this page for an overview of suggested stop words for use in nineteen additional languages.

dtSearch alphabet file

The following descriptions are for characters in the ASCII 33-127 range.

Letters

dtSearch defines letters as characters to index. This includes all alphabetic characters (a-z and A-Z) and all digits (0-9).

Note: dtSearch is case insensitive.

Hyphens

dtSearch defines hyphens as characters that receive special processing in dtSearch. By default, dtSearch only classifies the - character as a hyphen.

Spaces

dtSearch defines a space character as a character that causes a word break. By default, dtSearch treats the following characters as spaces:

\09\0a\0c\0d !@"#$&'()*+,./:;<=>?[\5c]^`{|}~

Values listed as \## are Unicode characters. Their definitions are:

  • \09 - horizontal tab
  • \0a - line feed
  • \0c - form feed
  • \0d - carriage return
  • \5c - backslash (\)

For more information, see dtSearch Unicode values for Special Characters. This article is found in the Relativity Community and you must log in to access it.

Note: You must have valid Relativity Community credentials in order to download any Community file linked to the documentation site. You'll need to enter those credentials on the Community login screen if you're not already logged in. If you're already logged in to the Community at the time you click a link, the file is automatically downloaded in the bottom left corner of your screen. If you get an error message stating "URL No Longer Exists" after clicking a Community link, it may be due to a single sign-on error related to the SAML Assertion Validator, and you should contact your IT department.

Ignore

dtSearch defines an ignored character as a character that's ignored when processing text. By default, dtSearch ignores the following characters:

  • \08 %
  • \08 is the backspace character in Unicode.

End

dtSearch has defined ranges for CJK characters and these will make each Thai, Chinese, and Japanese character a separate word. See Setting up CJK document workspaces in Relativity for more detail.

Searching for a symbol or character

To search for a symbol or character in Relativity please go to the section Searching for a symbol or watch the below video.

Watch the following Recipe - How to Adjust the dtSearch Alphabet File for Symbols video.

References