Supported languages matrix

This table displays each language supported by a Relativity feature and its corresponding functionality status. The features include OCR, Assisted Review, Structured Analytics, Processing, and the Viewer. Stemming, date recognition, and querying on abbreviations (i.e., a single letter followed by a period) are only available in English text in a dtSearch index. The SQL Server settings determine the languages available for word-break characters used in the full text index.

Use the following resources for more information on SQL Server and dtSearch supported languages:

  • See this site for a list of search features for languages supported by dtSearch.
  • See this site for a list of Unicode supported languages also supported by dtSearch.
  • See this site for a list of SQL Server supported languages.

See Command line import for a complete list of alternate language encoding values and Importing document metadata, files, and extracted text for instructions on importing documents with the Relativity Desktop Client and selecting the appropriate file encoding value.

  • √ - indicates that the language is supported.
  • √* - indicates that the language must be installed in the Microsoft operating system for the viewer to function. Specifically, you must install the language to the web server, conversion agent server, and local workstation.
  • If the cell is empty, the feature is not supported.

Special considerations

Note the following details about the supported languages:

  • dtSearch in Relativity is accent-insensitive by default. This means characters with accent marks and other diacritics are stored in the same fashion as those without those marks. If you need to perform a search that includes accents, change the Create Accent Sensitive setting on the dtSearch index to Yes.

  • Indexing in SQL is based on the character set of the language you select. Western languages are similar grammatically, which means that you should experience no issues when searching for English words with SQL. In addition, SQL tokenization is only used for symbols that mean one thing when they are alone, but something else when they are put together with other symbols, such as with CJK languages.
  • Conceptual Analytics and Classification indexes are language-agnostic and therefore support all languages. Categorization does not display Unicode choices in the field tree properly.

  • The Arabic, Hebrew, Thai, Vietnamese, and Belarusian languages are displayed as selectable for the Default OCR language on the Processing Profile but they are not supported and will not work if you select them. They will be removed in Server 2023 patch 1.

Supported languages

LanguageOCRProcessing Native Imaging Structured Analytics Language IdentificationViewer
English
Abkhazian    
Afar    
Afrikaans
Akan    
Albanian
Amharic    
Arabic *
Armenian *
Assamese    
Aymara
Azerbaijani    
Bashkir    
Basque
Belarusian   
Bengali    
Bemba 
Bihari    
Bislama    
Blackfoot 
Bosnian    
Breton
Bugotu 
Bulgarian (Cyrillic)
Byelorussian (Cyrillic)
Burmese    
Catalan
Cebuano    
Chamorro 
Chechen 
Cherokee    
Chinese (Simplified)*
Chinese (Traditional) *
Chuana or Tswana 
Corsican
Croatian
Crow 
Czech
Danish
Dhivehi    
Dholuo    
Dutch
Dzongkha    
Eskimo 
Esperanto
Estonian
Ewe    
Faroese
Fijian
Finnish
French
Frisian
Friulian 
Ga    
Gaelic Irish 
Gaelic Scottish 
Galician
Ganda or Luganda
Georgian  *
German
Greek
Greenlandic    
Guarani
Gujarati    
Haitian Creole    
Hani 
Hausa    
Hawaiian
Hebrew *
Hindi     
Hmong    
Hungarian
Icelandic
Ido 
Igbo    
Indic Languages    *
Indonesian
Interlingua
Interlingue    
Inuktitut    
Inupiak    
Irish    
Italian
Japanese*
Javanese    
Kabardian 
Kannada    
Kashmiri    
Kashubian 
Kawa 
Kazakh    
Khasi    
Khmer    
Kikuyu 
Kinyarwanda    
Kongo 
Korean*
Kpelle 
Krio    
Kurdish
Kyrgyz    
Laothian    
Latin
Latvian
Limbu    
Lingala    
Lithuanian
Lozi    
Luba 
Lule Sami 
Luxembourgian
Macedonian (Cyrillic)
Malagasy
Malay
Malayalam    
Malinke 
Maltese
Manx    
Maori
Marathi    
Mauritian Creole    
Mayan 
Miao 
Minankabaw 
Mohawk 
Moldavian (Cyrillic) 
Mongolian    
Montengrin    
Nahuatl 
Nauru    
Nepali    
Newari    
Northern Sami 
Norwegian
Norwegian Nynorsk    
Nyanja
Occidental 
Occitan    
Ojibway 
Oriya    
Oromo    
Ossetian    
Pampanga    
Papiamento 
Pashto    
Pedi    
Persian    
Pidgin English 
Polish
Portuguese
Portuguese (Brazilian)
Provencal 
Punjabi    
Quechua
Rajasthani    
Rhaetic 
Rhaeto - Romance    
Romanian
Romany 
Ruanda 
Rundi
Russian (Cyrillic)
Sami 
Samoan
Sango    
Sanskrit    
Sardinian 
Scots    
Scottish Gaelic    
Serbian (Cyrillic)
Serbian (Latin)
Seselwa    
Sesotho    
Shona
Sindhi    
Sinhalese    
Sioux 
Siswant    
Slovak
Slovenian
Somali
Sotho, Suto, or Sesuto 
Southern Sami 
Spanish
Sudanese
Swahili
Swazi 
Swedish
Syriac    
Tagalog
Tahitian 
Tajik    
Tamil    
Tatar    
Telugu    
Thai *
Tibetan    
Tigrinya    
Tinpo 
Tonga    
Tongan 
Tshiluba    
Tsonga    
Tswana    
Tumbuka    
Tun 
Turkish
Turkmen    
Twi    
Uighur    
Ukrainian (Cyrillic)
Urdu    
Uzbek    
Venda    
Vietnamese  *
Visayan 
Volapuk    
Waray-Waray    
Welsh
Wend or Sorbian 
Wolof
Xhosa
Yiddish    
Yoruba    
Zapotec 
Zhuang    
Zulu

See Command line import for a complete list of supported languages encoding values.