If you're working with short audio files, less than 15 seconds, you can send the audio data directly to the /v2/stream endpoint which will return a transcript to you within a few hundred milliseconds, directly in the request-response loop.

    Audio Requirements

    The audio data you send to this endpoint has to comply with a strict format. This is because we don't do any transcoding to your data, we send it directly to the model for transcription. You can send the content of a .wav file to this endpoint, or raw data read directly from a microphone. Either way, you must record your audio in the following format to use this endpoint:

    POST Params

    When making a POST request to this endpoint, you should include the following parameters.

    Param Example Info Required
    audio_data UklGRtjIAABXQVZFZ… Raw audio data, base64 encoded. This can be the raw data recorded directly from a microphone, or read from a wav file. Yes
    punctuate True This is set to False by default; however, a developer can add auto punctuation by setting it to True. No
    base64 encoding: base64 encoding is a simple way to encode your raw audio data so that it can be included as a JSON parameter in your POST request. Most programming languages have very simple built-in functions for encoding binary data to base64.

    POST Response

    Depending on how much audio data you send, the API will respond within 100-750 milliseconds. The following keys will be in the JSON response.

    Param Example Info
    id "5551722-f677-48a6-9287-39c0aafd9ac1" The unique id of your transcription.
    status "completed" The status of your transcription.
    confidence 0.956 The confidence score of the entire transcription, between 0 and 1.
    text "You know Demons on TV like..." The complete transcription for your audio.
    words [{"confidence": 1.0, "end": 440, "start": 0, "text": "You"}, ...] An array of objects, with the information for each word in the transcription text. Will include the start/end time (in milliseconds) of the word, and the confidence score of the word.
    created "2019-06-27 22:26:47.048512" The timestamp for your request