Alphabet list

Some of the characters in the alphabet file are not printable. Screenshots were used instead of the actual text. You cannot copy or paste the Spaces or Ignore characters since they are not printable. Instead, use the dtSearchDefaultAlphabetFile instance setting to update the dtSearch default alphabet file.

Note: Each sequence must start with a leading, or empty, space. Not having the leading space may produce errors.

Alphabet List Leading Empty Space

dtSearch Alphabet File

The following is the default dtSearch Alphabet file you'll find in Relativity.

Alphabet file validation

When you save a dtSearch index, Relativity runs a validation check on the alphabet list. You will see a warning message if Relativity detects invalid spacing or syntax. You cannot save the index if there are errors with the alphabet list.

The validation check includes:

  • Header sections
    • Header section appears first in Alphabet
    • Exact header section without any added whitespace
    • Required newline before section
  • Letters
    • Exact title, allowing any whitespace and comments preceding double slash //
    • Each letter on own line with preceding space
    • Each letter variant separate by single space
    • Allow any extra whitespace after letter
  • Hyphens, Spaces, and Ignore
    • Exact title, allowing any whitespace
    • Single line of characters with preceding space
    • Optional newlines before next section
  • Footer sections
    • Exact title
    • Skip validating any text following title
  • General
    • Purple, Pink, Red, Green sections are each optional and can be in any order

Alphabet file sections

The following descriptions are for characters in the ASCII 33-127 range.

Letters

dtSearch defines letters as characters to index. This includes all alphabetical characters (a-z and A-Z) and all digits (0-9).

Note: dtSearch is case insensitive. You cannot make dtSearch case-sensitive in Relativity by modifying the Letters section of the Alphabet file.

Hyphens

dtSearch defines hyphens as characters that receive special processing in dtSearch. By default, dtSearch only classifies the - character as a hyphen.

Spaces

dtSearch defines a space character as a character that causes a word break. These characters are not indexed and are not searchable. By default, dtSearch treats the following characters as spaces:

\09\0a\0c\0d !@"#$&'()*+,./:;<=>?[\5c]^`{|}~

Values listed as \## are Unicode code points. Their definitions are:

  • \09 - horizontal tab
  • \0a - line feed
  • \0c - form feed
  • \0d - carriage return
  • \5c - backslash (\)

For more information, see dtSearch Unicode values for Special Characters. You must log into the Relativity Community to access the topic.

Note: You must have valid Relativity Community credentials in order to download any Community file linked to the documentation site. You will need to enter those credentials on the Community login screen if you are not already logged in. If you are already logged in to the Community at the time you click a link, the file is automatically downloaded in the bottom left corner of your screen. If you get an error message stating URL No Longer Exists after clicking a Community link, you should contact your IT department. It may be due to a single sign-on error related to the SAML Assertion Validator

Ignore

dtSearch defines an ignored character as a character that is not indexed and does not create a word break when processing text. These characters are not searchable. By default, dtSearch ignores the following characters:

\08%

Values listed as \## are Unicode code points. Their definitions are:

\08 - backspace character

End

dtSearch has defined ranges for CJK characters and these will make each Thai, Chinese, and Japanese character a separate word. For more information, see Setting up CJK document workspaces in Relativity .

Non-ASCII characters

Non-ASCII characters have a Unicode value greater than 0x7F. Many characters that are not ASCII are searchable by default. For those which are not, for example and £, you can index them by adding their hexadecimal code to the AdditionalLetters section of the alphabet file. For more information, see Searching for a symbol.

Restricted characters

Some characters cannot be queried with standard syntax because of a limitation in dtSearch or because of how Relativity uses the dtSearch API. The following characters require special treatment in your query:

" ( ) * ? % ~ # & =

For searching with parenthesis, see Search for parentheses

You can use a regular expression to search for these characters. For example, Searching for an asterisk.

Searching for a symbol or character

To search for a symbol or character in Relativity, see Searching for a symbol.

Reserved characters in the alphabet file

If you add a reserved character to the alphabet and was able to bring it back in your results, it's because dtSearch treats reserved characters as operators regardless of what you set in the Alphabet file. Consider how those operators act when you determine whether a solution works.

For example, you added the % to your Alphabet file and removed it from the Ignore list, and you were able to bring back apple%.

The % is the fuzzy operator, meaning you can have any one character, or no character, in this spot and bring results back. This is very similar to how *, wildcard, or ?, wildcard for any single character, work. Remember that because % is no longer being ignored, it will be indexed and will show up as part of the term. The word apple% was returned indirectly, because you matched the pattern apple + any indexed character. You cannot search for just % and bring back correct results.