Real-time streaming transcription

    If you're working with live audio, you can stream your audio data in real-time to our secure WebSocket API found at wss://api.assemblyai.com/v2/realtime/ws. We will stream your transcripts to you within a few hundred milliseconds, and additionally, revise these transcripts with more accuracy within seconds.

    Open Source Example Code

    Here are some open-source examples of our real-time endpoint.

    Establishing a Websocket Connection

    Websocat is an easy-to-use CLI for testing out websockets APIs. We shall use this tool in our examples. You can find more info on Websocat here.

    To connect with the real-time endpoint, you must use a WebSocket client and establish a connection with wss://api.assemblyai.com/v2/realtime/ws.

    Authentication

    If a you would like to create a tempory token for in-browser authentication you can learn more on that here.

    Authentication is handled via the "authorization" header. The value of this header should be your API token. For example, in websocat:

    $ websocat wss://api.assemblyai.com/v2/realtime/ws?sample_rate=16000 -H Authorization:<API_TOKEN> 
    {
            "message_type": "SessionBegins", 
            "session_id": "d3e8c537-2f11-494b-b497-e59a434588bd", 
            "expires_at": "2021-04-07T11:32:25.300329"
    }

    Required Query Params

    This endpoint also requires a query param sample_rate that defines the wav sample rate used in this stream. For example, in websocat:

    $ websocat wss://api.assemblyai.com/v2/realtime/ws?sample_rate=16000 -H Authorization:<API_TOKEN> 
    {
            "message_type": "SessionBegins", 
            "session_id": "d3e8c537-2f11-494b-b497-e59a434588bd", 
            "expires_at": "2021-04-07T11:32:25.300329"
    }

    Session Descriptor Message

    Once your request is authorized and connection established, your client will receive a SessionBegins message with the following JSON data:

    Parameter Example Info
    message_type SessionBegins Describes the message type.
    session_id d3e8c537-2f11-494b-b497-e59a434588bd Unique identifier for the established session. Can be used to restablish session
    expires_at 2021-04-07T11:32:25.300329 Timestamp when this session will expire.

    Sending Audio

    Input Message

    When sending audio over the WebSocket connection, you should send a JSON payload with the following parameters.

    Parameter Example Info
    audio_data UklGRtjIAABXQVZFZ… Raw audio data, base64 encoded. This can be the raw data recorded directly from a microphone, or read from an audio file.
    base64 encoding: base64 encoding is a simple way to encode your raw audio data so that it can be included as a JSON parameter in your websocket message. Most programming languages have very simple built-in functions for encoding binary data to base64.

    For example, a message payload would look like this:

    {"audio_data": "UklGRtjIAABXQVZFZ..."}

    Audio Requirements

    The raw audio data in the audio_data field above must comply with a strict encoding. This is because we don't do any transcoding to your data, we send it directly to the model for transcription to reduce latency. The encoding of your audio must be in:

    Transcription Response Types

    Our real-time transcription pipeline uses a two-phase transcription strategy, broken into partial and final results.

    Partial Results

    As you send audio data to the API, the API will immediately start responding with transcriptions. The following keys will be in the JSON response from the WebSocket API.

    Parameter Example Info
    message_type PartialTranscript Describes the type of message.
    session_id "5551722-f677-48a6-9287-39c0aafd9ac1" The unique id of your transcription.
    audio_start 1200 Start time of audio sample relative to session start, in milliseconds.
    audio_end 1850 End time of audio sample relative to session start, in milliseconds.
    confidence 0.956 The confidence score of the entire transcription, between 0 and 1.
    text "You know Demons on TV like..." The complete transcription for your audio.
    words [{"confidence": 1.0, "end": 440, "start": 0, "text": "You"}, ...] An array of objects, with the information for each word in the transcription text. Will include the start/end time (in milliseconds) of the word, and the confidence score of the word.
    created "2019-06-27 22:26:47.048512" The timestamp for your request.

    Final Results

    After you've received your partial results, our model will continue to analyze incoming audio and, when it detects the end of an "utterance", it will finalize the results sent to you so far with higher accuracy, as well as add punctuation and casing to the transcription text.

    The following keys will be in the JSON response from the WebSocket API when Final Results are sent:

    Parameter Example Info
    message_type FinalTranscipt Describes the type of message.
    session_id "5551722-f677-48a6-9287-39c0aafd9ac1" The unique id of your transcription.
    audio_start 1200 Start time of audio sample relative to session start, in milliseconds.
    audio_end 1850 End time of audio sample relative to session start, in milliseconds.
    confidence 0.956 The confidence score of the entire transcription, between 0 and 1.
    text "You know Demons on TV like..." The complete transcription for your audio.
    words [{"confidence": 1.0, "end": 440, "start": 0, "text": "You"}, ...] An array of objects, with the information for each word in the transcription text. Will include the start/end time (in milliseconds) of the word, and the confidence score of the word.
    created "2019-06-27 22:26:47.048512" The timestamp for your request.

    Reconnecting to an existing session

    Sometimes unforeseen outages can cause your client to lose connection with our real-time servers. To help maintain continuity of your transcript stream, at the beginning of a connection we've provided you with a session descriptor message containing a session identifier. Using this identifier you can reconnect to your session and resume processing from where you left off. Simply reconnect to the following URL wss://api.assemblyai.com/v2/realtime/ws/{session_id}. The standard authorization scheme applies here as well.

    Example: Reconnecting to an existing session

    $ websocat wss://api.assemblyai.com/v2/realtime/ws/d3e8c537-2f11-494b-b497-e59a434588bd -H authorization:<API_TOKEN> 
    {"message_type": "SessionResumed", "session_id": "d3e8c537-2f11-494b-b497-e59a434588bd"}
    This is an optional flow. In the event that you lose connection, you can alternatively start a new session by reconnecting to wss://api.assemblyai.com/v2/realtime/ws without the session_id. This is a viable option, but you may lose some transcription accuracy for a short period of time.

    Ending a Session

    When you've completed your session, clients should send a json message with the following field.

    Parameter Example Info
    terminate_session true A boolean value to communicate that you wish to end your real-time session forever.


    $ websocat wss://api.assemblyai.com/v2/realtime/ws?sample_rate=16000 -H authorization:<API_TOKEN> 
    {"message_type": "SessionBegins", "session_id": "d3e8c537-2f11-494b-b497-e59a434588bd"}
    ...send audio...
    ...receive results...
    {"message_type": "SessionTerminated"} <-- Sent by client
    {"message_type": "FinalTranscript", ...}
    {"message_type": "SessionTerminated", "session_id": "d3e8c537-2f11-494b-b497-e59a434588bd"}

    If you have outstanding final transcripts, they will be sent to you. To finalize the session, a SessionTerminated message is sent to confirm our API has terminated your session. A terminated session cannot be reused.

    The WebSocket specification provides standard errors. Here's a brief breakdown of them here.

    Our API provides application-level WebSocket errors for well-known scenarios. Here's a breakdown of them.

    Closing and Status Codes

    Error Condition Status Code Message
    auth failed 4001 "Not Authorized"
    insufficient funds 4002 "Insufficient Funds"
    free tier user 4002 "This feature is paid-only and requires you add a credit card. Please visit http://www.assemblyai.com/account/ to add a credit card to your account"
    attempt to connect to nonexistant session id 4004 "Session not found"
    attempt to connect to closed session 4010 "Session previously closed"
    session expires 4008 "Session Expired"
    attempt to connect to expired session id 4008 "Session Expired"
    rate limited 4029 "Client sent audio too fast"
    session times out 4031 "Session idle for too long"
    audio too short 4032 "Audio duration is too short"
    audio too long 4033 "Audio duration is too long"
    bad json 4100 "Endpoint received invalid JSON"
    bad schema 4101 "Endpoint received a message with an invalid schema"
    reconnect attempts exhausted 1013 "Temporary server condition forced blocking client's request"

    Quotas and Limits

    The following limits are imposed to ensure performance and service quality. Please contact us if you'd like to increase these limits.

    Adding Custom Vocabulary

    Developers can also add custom vocabulary to their real-time session by adding the optional query parameter word_boost in the URL. The parameter should map to a JSON encoded list of strings as shown in this python example:

    import json
    from urllib.parse import urlencode
    
    sample_rate = 16000
    word_boost = ["foo", "bar"]
    params = {"sampling_rate": sampling_rate, "word_boost": json.dumps(word_boost)}
    
    url = f"ws://api.assemblyai.com/v2/realtime/ws?{urlencode(params)}"

    Creating Temporary Authentication Tokens

    In some cases, a developer will need to authenticate on the client-side and won't want to expose their AssemblyAI token. You can do this by sending a POST request to https://api.assemblyai.com/v2/realtime/token with the parameter expires_in: {TTL in seconds}. Below is a quick example in curl.

    The "expires_in" parameter must be greater than or equal to 60 seconds.
    curl --request POST \
    --url https://api.assemblyai.com/v2/realtime/token \
    --header 'authorization: YOUR_AAI_TOKEN' \
    --header 'content-type: application/json' \
    --data '{"expires_in": 60}'

    In response you will receive the following JSON output:

    { "token": "b2e3c6c71d450589b2f4f0bb1ac4efd2d5e55b1f926e552e02fc0cc070eaedbd" }

    A developer can now use this temporary token in the browser to authenticate a new WebSocket session with the following endpoint wss://api.assemblyai.com/v2/realtime/ws?sample_rate=16000&token={New Temp Token}. An example of JavaScript in the browser would be as follows.

    let socket;
    const token = 'b2e3c6c71d450589b2f4f0bb1ac4efd2d5e55b1f926e552e02fc0cc070eaedbd';
    
    socket = new WebSocket(`wss://api.assemblyai.com/v2/realtime/ws?sample_rate=16000&token=${token}`);