The AssemblyAI API can automatically detect the number of speakers in your audio file, and each word in the transcription text can be associated with its speaker. Simply include the
speaker_labels parameter in your
POST request, and set this to
Speakers will be labeled as
Speaker B, etc.
In order to be reliably identified and enrolled as a unique speaker, a person will need to speak for approximately 30 seconds over the course of the audio file.
Once your transcript is complete, and the
status key shows
"completed" as the value, you'll get a JSON response that includes an
utterances key. This key will contain a list of "turn-by-turn" utterances, as they appeared in the audio recording. A "turn" refers to a change in speakers during the conversation.
Speaker Labels is not supported when Dual Channel Transcription is turned on. You can have either Speaker Labels or Dual Channel enabled when submitting a file for transcription, but not both.
On the right, we show you how to simply include the
word_boost parameter in your
POST requests to include support for Custom Vocabulary. You can include words, phrases, or both in the
word_boost parameter. Any term included will have its likelihood of being transcribed boosted.
You can also include the optional
boost_param parameter in your
POST request to control how much weight should be applied to your keywords/phrases. This value can be either
a b c
Sometimes your word boost list may contain a unique character that the model is not expecting, such as the
Andrés. In these cases, our model will still accept the word and convert the special character to the ASCII equivalent if there is one; in this case,
Andres and then return the word in the transcript (if detected) without the accented/unique character.
You can pass a maximum of 1,000 unique keywords/phrases in your
word_boost list. Each keyword/phrase in the list must be 6 words or less.
If you have a dual channel audio file, for example a phone call recording with the agent on one channel and the customer on the other, the API supports transcribing each channel separately.
Simply include the
dual_channel parameter in your
POST request when submitting files for transcription, and set this parameter to
Dual channel transcriptions take ~25% longer to complete than normal, since we need to transcribe each channel which adds a little extra overhead!
Once your transcription is complete, there will be an additional
utterances key in the API's JSON response. The
utterances key will contain a list of turn-by-turn utterances, as they appeared in the audio recording, identified by each audio channel.
Each JSON object in the
utterances list contains the
channel information (this will be either
"2"), so you can easily tell which channel each utterance is from. Each word in the
words array will also contain the
By default, the API will remove Filler Words, like
"uhh", from transcripts.
To include Filler Words in your transcripts, set the
disfluencies parameter to
true in your POST request when submitting files for processing to the
/v2/transcript endpoint, as shown on the right.
The list of Filler Words the API will transcribe are:
Once the transcription has been completed, you will get a response from the API as per usual, but Filler Words will be present in the transcription text and words array just like any other spoken word.
By default, the API will punctuate the transcription text and will automatically case proper nouns, as well as convert numbers to their written format.
i ate ten hamburgers at burger king will be converted to
I ate 10 hamburgers at Burger King. If you want to turn these features off, you can disable either, or both, of them by including a few additional parameters in your API request.
By setting the
punctuate parameter to
false, you can disable the punctuation and text formatting features and, in the above example, the transcript returned to you will be
i ate ten hamburgers at burger king.
The transcript must be completed before using these API endpoints.
You can use either of the following endpoints to retrieve a completed transcript automatically broken down into paragraphs or sentences. Using these endpoints, the API will attempt to semantically segment your transcript into paragraphs/sentences to create more reader-friendly transcripts.
The JSON response for these endpoints is shown on the right.
By default, the API will return a verbatim transcription of the audio, meaning profanity will be present in the transcript if spoken in the audio.
To replace profanity with asterisks, as shown below, include the additional parameter
filter_profanity to your request when submitting files for transcription, and set this to
It was some tough s*** that they had to go through. But they did it. I mean, it blows my f****** mind every time I hear the story.
The JSON for your completed transcript will come back as-per-usual, but the text will contain asterisks when profanity was spoken.
Once a transcript has been completed, you can search through the transcript for a specific set of keywords. You can search for individual words, two word phrases, and numbers.
This request returns a JSON response with the following keys:
||The id of the transcript|
||Equals the total of all matched instances. For e.g., word 1 matched 2 times, and word2 matched 3 times,
||Contains a list/array of all matched words and associated data|
matches we will see all matched words with the keys of:
||The word itself|
||The total amount of times the word is in the transcript|
||An array of timestamps structured as [
||An array of all index locations for that word within the
With the list endpoint, you can retrieve a list of all the transcripts you have created. This list can also be filtered by the transcript status.
Simply make a
GET request, as shown to the right, with the following query parameters in your request. In the cURL statement to the right, for example, we are querying for the most recent 200 transcripts with the status of
|limit||Max results to return in a single response||Between 1 and 200 (defaults to 10)||Yes|
|status||Filter by transcript status||Must be "queued", "processing", "completed", or "error"||Yes|
The API response will contain two top-level keys, they are
transcripts key will contain an array of objects (your list of transcripts), with each object containing the following information:
||ID of the transcript|
||The current status of the transcript|
||The date and time the transcript was created|
||The date and time your transcript finished processing|
Since the API only returns a maximum of 200 transcripts per response, it treats each response as a "page" of results. The
page_details key will give you information about the current "page" you are on, and how to navigate to the next "page" of results.
To navigate to the next "page" of results, you will want to grab the value of
prev_url in the
page_details object from your initial
GET request. You can then make the same API call as before, replacing the endpoint with the value of
prev_url. You can continue to do this until
null, meaning you have pulled all your transcripts from the API!
Transcripts are listed from newest to oldest, so
prev_urlwill always point to the prior "page" of older transcripts.
Here is the cURL request from earlier, for example:
curl --request GET \ --url https://api.assemblyai.com/v2/transcript?limit=200&status=completed \ --header 'authorization: YOUR-ASSEMBLYAI-TOKEN' \ --header 'content-type: application/json'
Once we have the response, we can make a subsequent request below using the value of
prev_url to get the next "page" of results:
curl --request GET \ --url https://api.assemblyai.com/v2/transcript?limit=200&status=completed&before_id=8w5chxgaz-dcf5-4647-8cb4-cdfeaccdaa7d \ --header 'authorization: YOUR-ASSEMBLYAI-TOKEN' \ --header 'content-type: application/json'
You can continue to do this until the value of
null, meaning you have successfully retrieved all transcripts in your account!
When making a
GET request to list transcripts, you can include any of the following parameters with your
GET request to further filter the results you get back.
|limit||Max results to return in a single response||Between 1 and 200 (inclusive with a default value of 10)||Yes|
|status||Filter by transcript status||Must be queued, processing, completed, or error||Yes|
|created_on||Only return transcripts created on this date||Format: YYYY-MM-DD||Yes|
|before_id||Return transcripts that were created before this id||Valid transcript id||Yes|
|after_id||return transcripts that were created after this id||Valid transcript id||Yes|
|throttled_only||Only return throttled transcripts, overrides status filter||Boolean; true or false||Yes|
By default, AssemblyAI never stores a copy of the files you submit to the API for transcription. The transcription, however, is stored in our database, encrypted at rest, so that we can serve it to you and your application. If you'd like to permanently delete the transcription from our database once you've retrieved it, you can do so by making a
DELETE request to the API as shown in cURL on the right.