AV transcription

The audio-visual (AV) transcription tool analyzes audio or video files and transcribes the spoken words into a text field. This streamlines the process of reviewing AV files and makes it easier to search for keywords, find correlations, and locate relevant files.

See this related page:

AV transcription in the Viewer

Installing AV transcription

AV transcription is available as a secured application from the Application Library.

To install it:

Navigate to the Relativity Applications tab in your workspace.
Select Install from application library.
Select the AV Transcription application.
Click Install.

After installation completes, the following tabs will appear:

Transcription Jobs (workspace level)—view and manage transcription jobs within the workspace.
Transcription Queue (instance level)—view and manage transcription jobs across all workspaces in the instance.

For more information on installing applications, see Relativity applications.

Permissions for AV transcription

After you have installed AV transcription, assign permissions according to whether a user will be running the mass operation, viewing transcription jobs, or both.

Using the Transcribe mass operation

To use the Transcribe mass operation, you need the following permissions:

Mass Operation
Transcribe

Viewing and managing transcription jobs

All instance admins can view the instance-level Transcription Queue tab and use mass actions on it.

To view the workspace-level Transcription Jobs tab and use mass actions, you need the following permissions within the workspace:

Tab Visibility
Review Library

Any user with access to either tab can cancel or delete transcription jobs. For more information, see Canceling or deleting jobs.

Running AV transcription

The AV transcription application analyzes the audio contained in audio and video files, then transcribes the words into a text field you choose. Use the Transcribe mass operation to select files and choose the settings for the transcription job.

To transcribe audio:

From the Documents tab, select the files you want to transcribe. You can select up to 10,000 files, but only files containing an audio track will be analyzed. For a list of file types, see Supported file types.
Under Mass Actions, select Transcribe. An options modal appears.
Enter the following:
1. Name—enter a name for the transcription job using letters, numbers, and spaces. This identifies the job on the Transcription Queue tab.
2. Transcription field—select a field to contain the transcriptions. This must be a Data Grid enabled long text field, such as the extracted text field. For more information on enabling Data Grid, see Processing to Data Grid.
3. Email notification recipients (optional)—enter email addresses to receive updates about the job status. These addresses will receive automated notification messages from Relativity.
4. Primary Language—select the most common language spoken in the audio tracks.
5. Secondary Language(s) (optional)—select up to nine additional languages spoken in the audio tracks.
  - If your audio contains multiple variants of the same language, select only one. For example, if it contains both Mexican Spanish and Colombian Spanish, select either Spanish (Mexico) or Spanish (Colombia).
  - For more information, see How transcription languages work.
6. Enable automatic speaker partitioning—enable this to identify which words were spoken by each person. The identified speakers are labeled as Speaker 1, Speaker 2, and so on. For more information, see How speaker partitioning works.
Click Next.
A job summary appears.
Check that the details are correct, then click Submit.
A confirmation modal appears.

After you submit the job, you can view job details and status from the Transcription Queue tab. You can also view completed transcriptions for each document in the document Viewer.

For more information, see:

How transcription languages work

When you choose transcription languages, the AI model maps the sounds in the audio track to its list of known words for each language. You can select up to ten languages total: one primary and nine secondary languages. AV transcription does not censor profanity or offensive words in any language.

When selecting languages:

Set the primary language to the most frequently spoken language in the audio track.
Only select one locale for a language. If the audio contains two locales, choose the most frequently spoken. For example, if your audio contains mostly UK-based English speakers, plus one US-based speaker, select English (UK).

If you are not sure whether the audio track contains a specific language, try adding it as a secondary language. The processing time does not change significantly with extra languages. The main possible downside is that if a word is mumbled, it may be wrongly interpreted as that language.

Handling multiple languages

When the model maps sounds to words, it tries to find the closest match in any of the selected languages. The primary language acts as a default: If the application can't find a perfect match for a word, it will try to match it to a word in the primary language.

The model also pays attention to the language of surrounding words. If all the surrounding words were identified as one language, the model is more likely to assume that a word in the middle belongs to that language also. For example, if the sound "si" is surrounded by English, the model may assume it's the English word "see." If it's surrounded by Spanish, the model is more likely to assume that it's the Spanish word "sí."

In general, if you did not select a language when setting up the job, the model will not identify words spoken in that language. However, there are a few locales that contain languages that are frequently mixed. For example, French (Canada) and Spanish (United States) can both identify words in English.

Choosing language locales

When you select a language, you also select the locale in which the language is spoken. For example, Portuguese has two options: Portuguese (Brazil) and Portuguese (Portugal).

The amount the locale affects the transcription depends on how much the language itself varies from region to region. For languages with significant regional differences, choosing the right locale makes it easier for the model to match pronunciations and region-specific words. For languages with fewer regional differences, the locale choice may only affect small details such as spelling.

If the audio contains two locales, choose the most frequently spoken. For example, if your audio contains mostly UK-based English speakers, plus one US-based speaker, select English (UK). If you're not sure which locale is more common in the audio, choose the one that the reviewers are most comfortable reading.

How speaker partitioning works

Speaker partitioning, also known as diarization, organizes transcribed text according to who said what. When Enable automatic speaker partitioning is turned on, the transcription job attempts to identify and label which words were spoken by each person. The identified speakers are labeled as Speaker 1, Speaker 2, and so on. These labels are independent from file to file, so a person labeled as Speaker 1 in the first file might be labeled as Speaker 3 in another file, and so on.

The speaker partitioning labels do not appear in the long text field that holds the transcript. Instead, they're stored in a separate internal system with a copy of the transcript. When reviewers look at a transcribed file using the document Viewer, the Viewer pulls the internal labels and transcript copy and uses them to display the formatted transcript.

Because the speaker labels are only stored internally, they are not searchable or visible in the long text field. Enabling or disabling speaker partitioning does not affect tools that analyze the long text field, such as Review Center's Prioritized Review. Similarly, editing the long text field does not affect what appears in the Viewer.

Speaker partitioning works on files under 4 hours long that have a mono audio channel. For more information, see Audio channel support.

Viewing and managing transcription jobs

After starting an AV transcription job, you can view details and progress from two tabs. The workspace-level Transcription Jobs tab shows all transcription jobs within the workspace, and the instance-level Transcription Queue tab shows jobs across the entire instance.

The columns on the tabs are:

Details—click this icon to view full details and any exceptions for the job. For more information, see Viewing job details and exceptions.
Job Name—the name chosen for the job during setup.
Job ID—the unique ID of the job.
Job Status—the current state of the job.
Files To Be Transcribed—the number of files submitted during setup.
Files Successfully Transcribed—the number of files successfully transcribed. If a job is still in progress, this number may change.
Errored Files—the number of files that encountered errors during transcription. Each file may include more than one error. If a job is still in progress, this number may change.
Selected Primary Language—the primary language selected during setup.
Selected Secondary Language(s)—the secondary language or languages selected during setup.
Detected Languages—the languages that AV transcription detected within the files.
- If a job is still in progress, this list may change.
- This only detects languages that were selected at setup. For example, if you selected English and French when analyzing a file that includes only English, it will list English as the detected language. It will not detect unselected languages such as Korean or Arabic.
Submitted Date/Time—the date and time the job was submitted. Dates and times are based on your computer’s local time zone.
Completed Date/Time—the date and time the job finished. Dates and times are based on your computer’s local time zone.
Submitted by—the name of the user who submitted the transcription job.

The possible job statuses are:

Pending—the job is queued and waiting for processing.
Transcribing—the job is in progress and currently transcribing files.
Failed—the transcription job failed.
Successfully Completed With Errors—the job finished with one or more errors. For more information, see Viewing job details and exceptions.
Successfully Completed—the job finished without any errors.
Cancelled—a user cancelled the job.

Canceling or deleting jobs

On the Transcription Jobs and Transcription Queue tabs, you can use mass actions to delete jobs or cancel in-progress jobs.

Cancel—stops the job from completing. The record of the job stays on the tab, as well as any associated error records. If the job has already returned any results, those results stay as-is.
Delete—removes the job record and any error records associated with it. If the job has returned any results, those results stay as-is.

Viewing job details and exceptions

When you click the Details icon () for a job, a modal pops up with two tabs: Transcription job details, and Transcription job exceptions.

The Transcription job details tab shows the workspace name, the name of the user who submitted the job, and the settings they chose at job creation. It also shows the dates and times the job was submitted and completed.

The Transcription job exceptions tab shows a summary of any errors, warnings, or skipped documents in the job, as well as details for the first two exceptions. You can download the full list of exceptions as a CSV file by clicking Download full list of exceptions at the bottom of the details section.

The Exception summary fields are:

Files to be Transcribed—the number of files submitted during setup.
Files Successfully Transcribed—the number of files successfully transcribed. If a job is still in progress, this number may change.
Errored Files—the number of files that encountered errors during transcription. Each file may include more than one error. If a job is still in progress, this number may change.

The Exception details fields are:

File Name—the name of the file where the exception occurred.
Exception Type—the type of exception. The possible values are:
- Warning—the job returned results for this document, but the results might be inaccurate or incomplete. For example, if a file had multiple audio channels but only one channel was imported.
- Error—the job tried to process this document, but it could not return results. This includes files that had a mismatched file extension, used the wrong audio codec, or other issues.
- Skipped—the job skipped this document because it had the wrong file type or was too big to process. For more information, see Supported file types and Job capacity and size limits.

Viewing the CSV file of exceptions

When you download the full list of exceptions, the CSV file contains these columns:

ArtifactID—the Artifact ID of the file where the exception occurred.
ControlNumber—the document’s Control Number.
JobID—the unique ID of the transcription job.
TimeStamp—the time the error occurred, listed in the UTC time zone.
ExceptionType—the type of exception that occurred.
ExceptionMessage—a description of the exception.

Exception messages

Some of the common exceptions include:

Unsupported audio format: File cannot be transcribed—the document was not one of the supported file types. For more information, see Supported file types.
File successfully transcribed, but no speech was identified—the document was processed successfully, but the model could not find any human speech.
Multi-channel audio file detected. Transcription completed on channel 0. Remaining channels have not been transcribed—the file contained multiple audio channels, but only one audio channel was transcribed. For more information, see Audio channel support.
Stereo audio is not supported when using diarization—the file contained multiple audio channels, and the job settings had Enable automatic speaker partitioning turned on. Only one audio channel is supported for speaker partitioning. For more information, see Audio channel support.

For help troubleshooting, contact Relativity Support.

Viewing transcription results

When the transcription job completes, you can find the transcript of the file in the long text field you selected during setup.

Ways to use the transcript include:

Add the transcription field to document Views.
Select the transcription field when running searches or using features such as Review Center.
View the transcripts individually from the Viewer. For more information, see AV transcription in the Viewer.

Supported file types

AV transcription works with audio files and with video files that contain audio. To determine the file type, it checks the document's native type in Relativity.

It supports these file types:

AAC
ALAW in WAV container
AMR
FLAC
M4A
MP3
MULAW in WAV container
OPUS/OGG
SPEEX
WAV
WebM
WMA

When you select files to transcribe, you can select as many file types as you want. The job will automatically skip files that have the wrong file type.

Audio channel support

AV transcription only processes the first audio channel. Because of this, it works best on mono audio files with a single channel. If you enable speaker partitioning, you must select a file with mono audio.

If you run AV transcription without speaker partitioning on a file with multiple channels, we recommend checking the results. Results can vary depending on how the file's channels were set up.

Job capacity and size limits

When you select files to transcribe, the following limits apply:

You can select and process up to 10,000 files in one job. If you select any files without audio, those will be automatically excluded from transcription.
Do not submit files larger than 4 GB. The size limit after transcription is 1 GB of audio per file.
If you enable speaker partitioning, there is a 4 hour duration limit per file. Files must have a mono audio channel.

If you submit a large file and several smaller files, the large file will make the entire job take longer. You may want to bundle smaller files together into the same job and separate out larger files.

Archiving and restoring workspaces

The ARM application currently does not support AV transcription. If you archive and restore a workspace using ARM, you will need to re-run AV transcription.

When a workspace with AV transcription is restored, any text in the long text Transcription field remains, but files will not have a formatted transcript in the Viewer. To restore the formatted Viewer transcript, re-run AV transcription in the new workspace.

For more information on using ARM, see ARM Overview.