Supported languages matrix
This table displays each language supported by a Relativity feature and its corresponding functionality status. The features include OCR, Structured Analytics, Processing, and the Viewer. Stemming, date recognition, and querying on abbreviations (i.e., a single letter followed by a period) are only available in English text in a dtSearch index. The SQL Server settings determine the languages available for word-break characters used in the full text index.
Use the following resources for more information on SQL Server and dtSearch supported languages:
Note the following about the table below:
- √ - indicates that the language is supported.
- √* - indicates that the language must be installed in the Microsoft operating system for the viewer to function. Specifically, you must install the language to your local workstation.
- If the cell is empty, the feature is not supported.
Special considerations
Note the following details about the supported languages:
- dtSearch in Relativity is accent-insensitive by default. This means characters with accent marks and other diacritics are stored in the same fashion as those without those marks. If you need to perform a search that includes accents, change the Create Accent Sensitive setting on the dtSearch index to Yes.
- Analytics indexes are language-agnostic and therefore support all languages. Categorization does not display Unicode choices in the field tree properly.
- The Processing column reflects languages supported during OCR in Processing. Processing's text extraction is natively Unicode and supports the full Unicode spectrum.
Language support in aiR products
The underlying large language model (LLM) used by Relativity's aiR products has been evaluated for use with 83 languages. For a list of those languages, see Language support for Azure AI Content Safety on the Microsoft website.
Relativity's aiR products have been primarily tested on English-language documents, and unofficial testing with non-English datasets has resulted in the following recommendations:
- Rigorously follow best practices for writing and iterating on the Prompt Criteria. For more information, see Best practices and Developing prompt criteria in the aiR for Review documentation.
- Analyze the extracted text as-is. You do not need to translate it into English.
- When possible, write the Prompt Criteria in the same language as the documents being analyzed. This should also be the subject matter expert's native language. If that is not possible, write the Prompt Criteria in English.
When you view the results of the analysis, all citations stay in the same language as the document they cite. By default, the rationales and considerations are in English.
If you want the rationales and considerations to be in a different language, type “Write rationales and considerations in [desired language]” in the Additional Context field of the Prompt Criteria.
For the study used to evaluate Azure OpenAI's GPT-4 model across languages, see MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks on the arXiv website.
Language support in AV transcription
The audio-visual (AV) transcription application recognizes audio from over 100 languages and regional variants. For a full list, see AV transcription languages.
Supported languages