Chat with us, powered by LiveChatAssemblyAI | Reference

Reference

#

Transcript

This is an object representing a transcription. You can create them, retrieve them to see their status and results, and delete them.

The Transcript Object

Attribute Description Required
id string The unique identifier of your transcription Yes
status string The status of your transcription. queued, processing, completed, or error No
language_code string The language of your audio file. As of today, can be either "en_us", "en_au", or "en_uk" No
audio_url string The URL of your media file to transcribe No
text string The text transcription of your media file No
words array A list of all the individual words transcribed No
utterances array When dual_channel or speaker_labels is enabled, a list of turn-by-turn utterances No
confidence float The confidence our model has in the transcribed text, between 0.0 and 1.0 No
audio_duration float The duration of your media file, in seconds No
punctuate boolean Enable Automatic Punctuation, can be true or false No
format_text boolean Enable Text Formatting, can be true or false No
dual_channel boolean Enable Dual Channel transcription, can be true or false No
webhook_url string The URL we should send webhooks to when your transcript is complete No
webhook_status_code string The status code we received from your server when delivering your webhook No
auto_highlights_result array The list of results when enabling Automatic Transcript Highlights No
audio_start_from integer The point in time, in milliseconds, to begin transcription from in your media file No
audio_end_at integer The point in time, in milliseconds, to stop transcribing in your media file No
word_boost array A list of custom vocabulary to boost accuracy for No
boost_param string The weight to apply to words/phrases in the word_boost array; can be "low", "default", or "high" No
filter_profanity boolean Filter profanity from the transcribed text, can be true or false No
redact_pii boolean Redact PII from the transcribed text, can be true or false No
redact_pii_audio boolean Generate a copy of the original media file with spoken PII "beeped" out, can be true or false No
redact_pii_policies array The list of PII Redaction policies to enable No
redact_pii_sub string The replacement logic for detected PII, can be "entity_type" or "hash" No
speaker_labels boolean Enable Speaker Diarization, can be true or false No
content_safety boolean Enable Content Safety Detection, can be true or false No
iab_categories boolean Enable Topic Detection, can be true or false No
content_safety_labels array The list of results when content_safety is true No
iab_categories_result array The list of results when iab_categories is true No
disfluencies boolean Transcribe Filler Words, like "umm", in your media file; can be true or false No
sentiment_analysis boolean Enable Sentiment Analysis, can be true or false No
auto_chapters boolean Enable Auto Chapters, can be true or false No
chapters array When Auto Chapters is enabled, the list of Auto Chapters results No
sentiment_analysis_results array When Sentiment Analysis is enabled, the list of Sentiment Analysis results No
entity_detection boolean Enable Entity Detection, can be true or false No
entities array When Entity Detection is enabled, the list of detected Entities No
#

Create a Transcript

Create a transcription.

Parameters

Attribute Description
audio_url string required The URL of your media file to transcribe
language_code string The language of your audio file. As of today, can be either "en_us" (default), "en_au", or "en_uk"
punctuate boolean Enable Automatic Punctuation, can be true or false
format_text boolean Enable Text Formatting, can be true or false
dual_channel boolean Enable Dual Channel transcription, can be true or false
webhook_url string The URL we should send webhooks to when your transcript is complete
audio_start_from integer The point in time, in milliseconds, to begin transcription from in your media file
audio_end_at integer The point in time, in milliseconds, to stop transcribing in your media file
word_boost array A list of custom vocabulary to boost accuracy for
boost_param string The weight to apply to words/phrases in the word_boost array; can be "low", "default", or "high"
filter_profanity boolean Filter profanity from the transcribed text, can be true or false
redact_pii boolean Redact PII from the transcribed text, can be true or false
redact_pii_audio boolean Generate a copy of the original media file with spoken PII "beeped" out, can be true or false
redact_pii_policies array The list of PII Redaction policies to enable
redact_pii_sub string The replacement logic for detected PII, can be "entity_type" or "hash"
speaker_labels boolean Enable Speaker Diarization, can be true or false
content_safety boolean Enable Content Safety Detection, can be true or false
iab_categories boolean Enable Topic Detection, can be true or false
disfluencies boolean Transcribe Filler Words, like "umm", in your media file; can be true or false
sentiment_analysis boolean Enable Sentiment Analysis, can be true or false
auto_chapters boolean Enable Auto Chapters, can be true or false
entity_detection boolean Enable Entity Detection, can be true or false
#

Get a Transcript

Get the detailed information of a specific transcript by id.

#

Get all Sentences of a Transcript

Query for just the sentences of a transcript by id.

#

Get all Paragraphs of a Transcript

Query for just the paragraphs of a transcript by id.

#

Get All Transcripts

List all your transcripts.

Parameters

All of the below parameters are optional.

Attribute Description
limit integer Max results to return in a single response, between 1 and 200 inclusive
status string Filter by transcript status, "processing", "queued", "completed", or "error"
created_on string Only return transcripts created on this date; format: "YYYY-MM-DD"
before_id string Return transcripts that were created before this id
after_id string Return transcripts that were created after this id
throttled_only boolean Only return throttled transcripts, overrides status filter
#

Delete a Transcript

Permanently delete a transcript by id. The record of the transcript will exist and remain queryable, however, all fields containing sensitive data (like text transcriptions) will be permanently deleted.

#

Upload

Uploads can be used to upload media files directly to the AssemblyAI API for transcription.

The Upload Object

Attribute Description
upload_url string A URL that points to your audio file, accessible only by AssemblyAI's servers
#

Creating an Upload

Upload a file to our servers for transcription. Learn more at Uploading Local Files for Transcription

Headers

Attribute Value
Transfer-Encoding chunked

Body

The contents of your media file.

#

Stream

If you're working with short bursts of audio, less than 15 seconds, you can send the audio data directly to the /v2/stream endpoint which will return a transcript to you within a few hundred milliseconds, directly in the request-response loop.

Audio Requirements

The audio data you send to this endpoint has to comply with a strict format. This is because we don't do any transcoding to your data, we send it directly to the model for transcription. You can send the content of a .wav file to this endpoint, or raw data read directly from a microphone. Either way, you must record your audio in the following format to use this endpoint:

  • 16-bit Signed Integer PCM encoding (ie, a .wav file)
  • 8khz sampling rate
  • 128kbps bitrate
  • 16-bit Precision
  • Single channel
  • Headless (ie, strip any headers from wav files)
  • 15 seconds or less of audio per request

POST Params

When making a POST request to this endpoint, you should include the following parameters.

Param Example Info Required
audio_data UklGRtjIAABXQVZFZ… Raw audio data, base64 encoded. This can be the raw data recorded directly from a microphone or read from a wav file. Yes
format_text true This is set to false by default; however, a developer can add auto formatting of text by setting it to true. No
punctuate true This is set to false by default; however, a developer can add auto punctuation by setting it to true. No

base64 encoding:

base64 encoding is a simple way to encode your raw audio data so that it can be included as a JSON parameter in your POST request. Most programming languages have very simple built-in functions for encoding binary data to base64.

POST Response

Depending on how much audio data you send, the API will respond within 100-750 milliseconds. The following keys will be in the JSON response.

Param Example Info
id 5551722-f677-48a6-9287-39c0aafd9ac1 The unique id of your transcription.
status completed The status of your transcription.
confidence 0.956 The confidence score of the entire transcription, between 0 and 1.
text You know Demons on TV like... The complete transcription for your audio.
words [{"confidence": 1.0, "end": 440, "start": 0, "text": "You"}, ...] An array of objects, with the information for each word in the transcription text. Will include the start/end time (in milliseconds) of the word and the confidence score of the word.
created 2019-06-27 22:26:47.048512 The timestamp for your request