Supported languages matrix

This table displays each language supported by a Relativity feature and its corresponding functionality status. The features include OCR, Assisted Review, Structured Analytics, Processing, and the Viewer. Stemming, date recognition, and querying on abbreviations (i.e., a single letter followed by a period) are only available in English text in a dtSearch index. The SQL Server settings determine the languages available for word-break characters used in the full text index.

Use the following resources for more information on SQL Server and dtSearch supported languages:

See Command line import for a complete list of alternate language encoding values and Importing document metadata, files, and extracted text for instructions on importing documents with the Relativity Desktop Client and selecting the appropriate file encoding value.

  • √ - indicates that the language is supported.
  • √* - indicates that the language must be installed in the Microsoft operating system for the viewer to function. Specifically, you must install the language to your local workstation.
  • If the cell is empty, the feature is not supported.

Special considerations

Note the following details about the supported languages:

  • dtSearch in Relativity is accent-insensitive by default. This means characters with accent marks and other diacritics are stored in the same fashion as those without those marks. If you need to perform a search that includes accents, change the Create Accent Sensitive setting on the dtSearch index to Yes.
  • Conceptual Analytics and Classification indexes are language-agnostic and therefore support all languages. Categorization does not display Unicode choices in the field tree properly.
  • The Processing column reflects languages supported during OCR in Processing. Processing's text extraction is natively Unicode and supports the full Unicode spectrum.

See Command line import for a complete list of supported languages encoding values.

Language support in aiR products

The underlying large language model (LLM) used by Relativity's aiR products has been evaluated for use with 83 languages. For a list of those languages, see Language support for Azure AI Content Safety on the Microsoft website.

Relativity's aiR products have been primarily tested on English-language documents, and unofficial testing with non-English datasets has resulted in the following recommendations:

  • Rigorously follow best practices for writing and iterating on the Prompt Criteria. For more information, see Writing the Prompt Criteria and Iterating on the Prompt Criteria in the aiR for Review documentation.
  • Analyze the extracted text as-is. You do not need to translate it into English.
  • When possible, write the Prompt Criteria in the same language as the documents being analyzed. This should also be the subject matter expert's native language. If that is not possible, write the Prompt Criteria in English.

When you view the results of the analysis, all citations stay in the same language as the document they cite. By default, the rationales and considerations are in English.

If you want the rationales and considerations to be in a different language, type “Write rationales and considerations in [desired language]” in the Additional Context field of the Prompt Criteria.

For the study used to evaluate Azure OpenAI's GPT-4 model across languages, see MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks on the arXiv website.

Supported languages

Language OCR Processing Native Imaging Structured Analytics Language Identification Viewer
English
Abkhazian      
 
Afar      
 
Afrikaans
Akan      
 
Albanian
Amharic      
 
Arabic
√*
Armenian  
√*
Assamese      
 
Aymara
Azerbaijani      
 
Bashkir      
 
Basque
Belarusian
 
 
Bengali      
 
Bemba
 
Bihari      
 
Bislama      
 
Blackfoot
 
Bosnian      
 
Breton
Bugotu
 
Bulgarian (Cyrillic)
Byelorussian (Cyrillic)
Burmese      
 
Catalan
Cebuano      
 
Chamorro
 
Chechen
 
Cherokee      
 
Chinese (Simplified)
√*
Chinese (Traditional)
√*
Chuana or Tswana
 
Corsican
Croatian
Crow
 
Czech
Danish
Dhivehi      
 
Dholuo      
 
Dutch
Dzongkha      
 
Eskimo
 
Esperanto
Estonian
Ewe      
 
Faroese
Fijian
Finnish
French
Frisian
Friulian
 
Ga      
 
Gaelic Irish
 
Gaelic Scottish
 
Galician
Ganda or Luganda
Georgian    
√*
German
Greek
Greenlandic      
 
Guarani
Gujarati      
 
Haitian Creole      
 
Hani
 
Hausa      
 
Hawaiian
Hebrew
√*
Hindi      
 
Hmong      
 
Hungarian
Icelandic
Ido
 
Igbo      
 
Indic Languages        
√*
Indonesian
Interlingua
Interlingue      
 
Inuktitut      
 
Inupiak      
 
Irish      
 
Italian
Japanese
√*
Javanese      
 
Kabardian
 
Kannada      
 
Kashmiri      
 
Kashubian
 
Kawa
 
Kazakh      
 
Khasi      
 
Khmer      
 
Kikuyu
 
Kinyarwanda      
 
Kongo
 
Korean
√*
Kpelle
 
Krio      
 
Kurdish
Kyrgyz      
 
Laothian      
 
Latin
Latvian
Limbu      
 
Lingala      
 
Lithuanian
Lozi      
 
Luba
 
Lule Sami
 
Luxembourgian
Macedonian (Cyrillic)
Malagasy
Malay
Malayalam      
 
Malinke
 
Maltese
Manx      
 
Maori
Marathi      
 
Mauritian Creole      
 
Mayan
 
Miao
 
Minankabaw
 
Mohawk
 
Moldavian (Cyrillic)
 
Mongolian      
 
Montengrin      
 
Nahuatl
 
Nauru      
 
Nepali      
 
Newari      
 
Northern Sami
 
Norwegian
Norwegian Nynorsk      
 
Nyanja
Occidental
 
Occitan      
 
Ojibway
 
Oriya      
 
Oromo      
 
Ossetian      
 
Pampanga      
 
Papiamento
 
Pashto      
 
Pedi      
 
Persian      
 
Pidgin English
 
Polish
Portuguese
Portuguese (Brazilian)
Provencal
 
Punjabi      
 
Quechua
Rajasthani      
 
Rhaetic
 
Rhaeto - Romance      
 
Romanian
Romany
 
Ruanda
 
Rundi
Russian (Cyrillic)
Sami
 
Samoan
Sango      
 
Sanskrit      
 
Sardinian
 
Scots      
 
Scottish Gaelic      
 
Serbian (Cyrillic)
Serbian (Latin)
Seselwa      
 
Sesotho      
 
Shona
Sindhi      
 
Sinhalese      
 
Sioux
 
Siswant      
 
Slovak
Slovenian
Somali
Sotho, Suto, or Sesuto
 
Southern Sami
 
Spanish
Sudanese
Swahili
Swazi
 
Swedish
Syriac      
 
Tagalog
Tahitian
 
Tajik      
 
Tamil      
 
Tatar      
 
Telugu      
 
Thai
√*
Tibetan      
 
Tigrinya      
 
Tinpo
 
Tonga      
 
Tongan
 
Tshiluba      
 
Tsonga      
 
Tswana      
 
Tumbuka      
 
Tun
 
Turkish
Turkmen      
 
Twi      
 
Uighur      
 
Ukrainian (Cyrillic)
Urdu      
 
Uzbek      
 
Venda      
 
Vietnamese
 
√*
Visayan
 
Volapuk      
 
Waray-Waray      
 
Welsh
Wend or Sorbian
 
Wolof
Xhosa
Yiddish      
 
Yoruba      
 
Zapotec
 
Zhuang      
 
Zulu