Synchronous transcription for short audio files

    If you're working with short audio files, less than 15 seconds, you can send the audio data directly to the /v2/stream endpoint which will return a transcript to you within a few hundred milliseconds, directly in the request-response loop.

    Pre-Requisites

    The audio data you send to this endpoint has to comply with a strict format. This is because we don't do any transcoding to your data, we send it directly to the model for transcription. You can send the content of a wav file to this endpoint, or raw data read directly from a microphone. Either way, you must record your audio in the following format to use this endpoint:

    A handy tool called SoX can be used to inspect audio files to make sure they're in the proper format. You can install SoX with homebrew using brew install sox, or on Ubuntu with apt-get install sox. With the SoX program, you can run soxi /path/to/audio.wav to inspect the formats of your audio file, and to make sure they comply with our requirements.

    If your audio doesn't match these requirements, accuracy from the API will be very bad!

    Making the API Request

    The /v2/stream endpoint expects a single JSON parameter audio_data, which should contain your raw audio data, base64 encoded. Most programming languages have very simple built-in libraries for encoding binary data to base64. For example, here is how to make a request to the /v2/stream endpoint using Python:

    import requests
    import base64
    
    api_token = 'your-secret-api-token'
    headers = {'authorization': api_token}
    
    # read the binary data from a wav file
    with open('stream_test.wav', 'rb') as _in:
    
        # strip off wav headers
        data = _in.read()[44:]
    
    # base64 encode the binary data so it
    # can be included as a JSON parameter
    data = base64.b64encode(data)
    
    # send the data to the /v2/stream endpoint
    json_data = {'audio_data': data}
    
    response = requests.post('https://api.assemblyai.com/v2/stream', json=json_data, headers=headers)

    API Response

    Depending on the duration of your audio, the API will respond within 100-750 milliseconds. The response will look like this:

    {
        "status": "completed", 
        "confidence": 0.97, 
        "created": "2019-06-27 22:26:47.048512", 
        "text": "set the temperature in the office to nine degrees", 
        "words": [
            {
                "text": "set", 
                "confidence": 0.99, 
                "end": 660, 
                "start": 0
            }, 
            {
                "text": "the", 
                "confidence": 0.98, 
                "end": 800, 
                "start": 620
            }, 
            {
                "text": "temperature", 
                "confidence": 0.96, 
                "end": 1340, 
                "start": 800
            }, 
            {
                "text": "in", 
                "confidence": 1.0, 
                "end": 1520, 
                "start": 1320
            }, 
            {
                "text": "the", 
                "confidence": 0.9, 
                "end": 1660, 
                "start": 1500
            }, 
            {
                "text": "office", 
                "confidence": 0.98, 
                "end": 2100, 
                "start": 1640
            }, 
            {
                "text": "to", 
                "confidence": 0.96, 
                "end": 2440, 
                "start": 2080
            }, 
            {
                "text": "nine", 
                "confidence": 0.98, 
                "end": 2880, 
                "start": 2420
            }, 
            {
                "text": "degrees", 
                "confidence": 0.96, 
                "end": 3260, 
                "start": 2840
            }
        ], 
        "id": "86tut5kj-7487-4c1a-94a6-06c2a88fb3bd"
    }