

The audio-visual (AV) transcription tool analyzes audio or video files and transcribes the spoken words into a text field. This streamlines the process of reviewing AV files and makes it easier to search for keywords, find correlations, and locate relevant files.
See this related page:
AV transcription is available as a secured application from the Application Library.
To install it:
After installation completes, the following tabs will appear:
For more information on installing applications, see
After you have installed AV transcription, assign permissions according to whether a user will be running the mass operation, viewing transcription jobs, or both.
To use the Transcribe mass operation, you need the following permissions:
Mass Operation |
---|
|
All instance admins can view the instance-level Transcription Queue tab and use mass actions on it.
To view the workspace-level Transcription Jobs tab and use mass actions, you need the following permissions within the workspace:
Tab Visibility |
---|
|
Any user with access to either tab can cancel or delete transcription jobs. For more information, see Canceling or deleting jobs.
The AV transcription application analyzes the audio contained in audio and video files, then transcribes the words into a text field you choose. Use the Transcribe mass operation to select files and choose the settings for the transcription job.
To transcribe audio:
Enable automatic speaker partitioning—enable this to identify which words were spoken by each person. The identified speakers are labeled as Speaker 1, Speaker 2, and so on. For more information, see How speaker partitioning works.
After you submit the job, you can view job details and status from the Transcription Queue tab. You can also view completed transcriptions for each document in the document Viewer.
For more information, see:
When you choose transcription languages, the AI model maps the sounds in the audio track to its list of known words for each language. You can select up to ten languages total: one primary and nine secondary languages. AV transcription does not censor profanity or offensive words in any language.
When selecting languages:
If you are not sure whether the audio track contains a specific language, try adding it as a secondary language. The processing time does not change significantly with extra languages. The main possible downside is that if a word is mumbled, it may be wrongly interpreted as that language.
When the model maps sounds to words, it tries to find the closest match in any of the selected languages. The primary language acts as a default: If the application can't find a perfect match for a word, it will try to match it to a word in the primary language.
The model also pays attention to the language of surrounding words. If all the surrounding words were identified as one language, the model is more likely to assume that a word in the middle belongs to that language also. For example, if the sound "si" is surrounded by English, the model may assume it's the English word "see." If it's surrounded by Spanish, the model is more likely to assume that it's the Spanish word "sí."
In general, if you did not select a language when setting up the job, the model will not identify words spoken in that language. However, there are a few locales that contain languages that are frequently mixed. For example, French (Canada) and Spanish (United States) can both identify words in English.
When you select a language, you also select the locale in which the language is spoken. For example, Portuguese has two options: Portuguese (Brazil) and Portuguese (Portugal).
The amount the locale affects the transcription depends on how much the language itself varies from region to region. For languages with significant regional differences, choosing the right locale makes it easier for the model to match pronunciations and region-specific words. For languages with fewer regional differences, the locale choice may only affect small details such as spelling.
If the audio contains two locales, choose the most frequently spoken. For example, if your audio contains mostly UK-based English speakers, plus one US-based speaker, select English (UK). If you're not sure which locale is more common in the audio, choose the one that the reviewers are most comfortable reading.
Speaker partitioning, also known as diarization, organizes transcribed text according to who said what. When Enable automatic speaker partitioning is turned on, the transcription job attempts to identify and label which words were spoken by each person. The identified speakers are labeled as Speaker 1, Speaker 2, and so on. These labels are independent from file to file, so a person labeled as Speaker 1 in the first file might be labeled as Speaker 3 in another file, and so on.
The speaker partitioning labels do not appear in the long text field that holds the transcript. Instead, they're stored in a separate internal system with a copy of the transcript. When reviewers look at a transcribed file using the document Viewer, the Viewer pulls the internal labels and transcript copy and uses them to display the formatted transcript.
Because the speaker labels are only stored internally, they are not searchable or visible in the long text field. Enabling or disabling speaker partitioning does not affect tools that analyze the long text field, such as Review Center's Prioritized Review. Similarly, editing the long text field does not affect what appears in the Viewer.
Speaker partitioning works on files under 4 hours long that have a mono audio channel. For more information, see Job capacity and size limits.
After starting an AV transcription job, you can view details and progress from two tabs. The workspace-level Transcription Jobs tab shows all transcription jobs within the workspace, and the instance-level Transcription Queue tab shows jobs across the entire instance.
The columns on the tabs are:
Job ID—the unique ID of the job.
Submitted by—the name of the user who submitted the transcription job.
The possible job statuses are:
On the Transcription Jobs and Transcription Queue tabs, you can use mass actions to delete jobs or cancel in-progress jobs.
When you click the Details icon () for a job, a modal pops up with two tabs: Transcription job details, and Transcription job exceptions.
The Transcription job details tab shows the workspace name, the name of the user who submitted the job, and the settings they chose at job creation. It also shows the dates and times the job was submitted and completed.
The Transcription job exceptions tab shows a summary of any errors, warnings, or skipped documents in the job, as well as details for the first two exceptions. You can download the full list of exceptions as a CSV file by clicking Download full list of exceptions at the bottom of the details section.
The Exception summary fields are:
The Exception details fields are:
When you download the full list of exceptions, the CSV file contains these columns:
ControlNumber—the document’s Control Number.
ExceptionType—the type of exception that occurred.
ExceptionMessage—a description of the exception.
Some of the common exceptions include:
Unsupported audio format: File cannot be transcribed—the document was not one of the supported file types. For more information, see Supported file types.
File successfully transcribed, but no speech was identified—the document was processed successfully, but the model could not find any human speech.
Multi-channel audio file detected. Transcription completed on channel 0. Remaining channels have not been transcribed—the file contained multiple audio channels, but only one audio channel was transcribed. For more information, see Audio channel support.
Stereo audio is not supported when using diarization—the file contained multiple audio channels, and the job settings had Enable automatic speaker partitioning turned on. Only one audio channel is supported for speaker partitioning. For more information, see Audio channel support.
For help troubleshooting, contact Relativity Support.
When the transcription job completes, you can find the transcript of the file in the long text field you selected during setup.
Ways to use the transcript include:
AV transcription works with audio files and with video files that contain audio. To determine the file type, it checks the document's native type in Relativity.
It supports these file types:
When you select files to transcribe, you can select as many file types as you want. The job will automatically skip files that have the wrong file type.
AV transcription only processes the first audio channel. Because of this, it works best on mono audio files with a single channel. If you enable speaker partitioning, you must select a file with mono audio.
If you run AV transcription without speaker partitioning on a file with multiple channels, we recommend checking the results. Results can vary depending on how the file's channels were set up.
When you select files to transcribe, the following limits apply:
If you submit a large file and several smaller files, the large file will make the entire job take longer. You may want to bundle smaller files together into the same job and separate out larger files.
The ARM application currently does not support AV transcription. If you archive and restore a workspace using ARM, you will need to re-run AV transcription.
When a workspace with AV transcription is restored, any text in the long text Transcription field remains, but files will not have a formatted transcript in the Viewer. To restore the formatted Viewer transcript, re-run AV transcription in the new workspace.
For more information on using ARM, see ARM Overview.
On this page
Why was this not helpful?
Check one that applies.
Thank you for your feedback.
Want to tell us more?
Great!