Chat with us, powered by LiveChatAssemblyAI | Overview

Overview

#

Introduction

The AssemblyAI API can be used to quickly convert pre-recorded audio files, as well as real-time audio streams, into accurate text transcriptions.

In addition to transcription, you can use the API's Audio Intelligence features to understand your audio data, with features like Sentiment Analysis, Summarization, Entity Detection, Topic Detection, and more.

Our team of developers is online nearly 24x7 to answer any questions you might have about our API, Documentation, or any of our features.

#

Processing Times

Asynchronous Transcription

Asynchronous transcription refers to transcription of pre-recorded audio/video files.

When you submit an audio file for transcription, it will complete in 15-30% of the audio file's duration. For example, a 10 minute file would complete in around 1.5 minutes, but could take up to 3 minutes.

Real-Time Streaming Transcription

Our Real-Time Streaming WebSocket API streams text transcriptions back to clients within a few hundred milliseconds.

#

Throttle Limits

Your account has what's called a "Throttle Limit" - which controls how many concurrent audio files, or real-time audio streams, you can process in parallel.

If you need a higher limit than what we list below, please reach out to us to have these limits increased.

Asynchronous Limits

Below are the limits for how many audio files you can have processing in parallel when submitting jobs via the /v2/transcript endpoint. If you go over your limit, your jobs will be will begin to queue.

Account Type Limit
Free 5
Paid 32

If you need a higher limit than what is listed above, please reach out to us to have these limits increased.

Real-Time Streaming Limits

Below are the limits for how many real-time audio streams you can have open in parallel.

Account Type Limit
Free 0
Paid 32

If you need a higher limit than what is listed above, please reach out to us to have these limits increased.

More on Concurrency

Your Throttle Limit is the same for Asynchronous and Real-Time Transcription but when it comes to the number of jobs processing in parallel, these two are calculated separately. So if your Throttle Limit is 32, you can have up to 32 Asynchronous files processing and up to 32 Real-Time streams open at the same time and not exceed your Throttle Limit

#

Error Handling and Failed Transcripts

Errors Making Requests to the API

The API will always return a JSON response when there is an error.

Invalid API Token

API requests made with an invalid API token will always return a 401 status code and a JSON response like:

{
  "error": "Authentication error, API token missing/invalid"
}

Invalid API Request

When something is wrong with your API request, the API will return with a status code 400:

{
  "error": "format_text must be a Boolean"
}

The error key will always contain more information about what was wrong with your request.

Server Errors

When something is wrong on our side, the API will return with a status code 500:

{
  "error": "Server error, developers have been alerted."
}

Failed Transcription Jobs

A transcription job can fail because something was wrong with your audio file, or because of an error on our side.

Whenever a transcription job fails, the status of the transcription will go to error, and there will be an error key in the JSON response from the API when fetching the transcription with a GET request.

The error key will describe the error in more detail. For example:

{
    // the status is shown as error here
    "status": "error",
    // the error is described in detail here
    "error": "Download error to https://foo.bar, 403 Client Error: Forbidden for url: https://foo.bar",
    ...
}

Transcripts usually fail because of one of the following reasons:

  • Unsupported audio file format
  • Audio file did not contain audio data
  • Audio file was too short (<200 milliseconds)
  • URL of audio file is unreachable
  • An error on our side

When a transcription job fails due to an error on our side (a server error), we always recommend resubmitting the file for transcription. When you resubmit the file, usually a different server in our cluster will be able to process your audio file successfully.

#

Supported Languages

The table below shows which languages are supported by the AssemblyAI API, their language_code values, and the features available for that language.

See the Specifying a Language documentation for more information on using the language_code parameter to specify the language of the file you are submitting for transcription.

#

Supported languages - Table

#

Supported languages - Warning

Pro tip

If you try to use a feature that is not supported for the language_code included in your POST request you will receive a 400 status code and a "The following addons are not available in this language: <feature name(s)>" error.

#

Supported File Types

The AssemblyAI API can transcribe a large number of audio and video file formats.

Supported Audio Files

If you don't find your audio file format listed in the below list, please let us know and we can look into adding support for it.

  • .3ga
  • .aac
  • .ac3
  • .aif
  • .aiff
  • .alac
  • .amr
  • .ape
  • .au
  • .dss
  • .flac
  • .flv
  • .m4a
  • .m4b
  • .m4p
  • .mp3
  • .mpga
  • .ogg, .oga, .mogg
  • .opus
  • .qcp
  • .tta
  • .voc
  • .wav
  • .wma
  • .wv

Supported Video Files

The AssemblyAI API can also transcribe video files, automatically stripping the audio out of the video file. If you don't find your video file format listed in the below list, please let us know and we can look into adding support for it.

  • .webm
  • .MTS, .M2TS, .TS
  • .mov
  • .mp4, .m4p (with DRM), .m4v
  • .mxf