If you're working with short audio files, less than 15 seconds, you can send the audio data directly to the /v2/stream endpoint which will return a transcript to you within a few hundred milliseconds, directly in the request-response loop.

    For a complete guide on using this endpoint, check out the guide on Synchronous Transcription for Short Audio Files.

    Audio Requirements

    The audio data you send to this endpoint has to comply with a strict format. This is because we don't do any transcoding to your data, we send it directly to the model for transcription. You can send the content of a .wav file to this endpoint, or raw data read directly from a microphone. Either way, you must record your audio in the following format to use this endpoint:

    POST Params

    When making a POST request to this endpoint, you should include the following parameters.

    Param Example Info Required
    audio_data UklGRtjIAABXQVZFZ… Raw audio data, base64 encoded. This can be the raw data recorded directly from a microphone, or read from a wav file. Yes
    base64 encoding: base64 encoding is a simple way to encode your raw audio data so that it can be included as a JSON parameter in your POST request. Most programming languages have very simple built-in functions for encoding binary data to base64.

    POST Response

    Depending on how much audio data you send, the API will respond within 100-750 milliseconds. The following keys will be in the JSON response.

    Param Example Info
    id "5551722-f677-48a6-9287-39c0aafd9ac1" The unique id of your transcription.
    status "completed" The status of your transcription.
    confidence 0.956 The confidence score of the entire transcription, between 0 and 1.
    text "You know Demons on TV like..." The complete transcription for your audio.
    words [{"confidence": 1.0, "end": 440, "start": 0, "text": "You"}, ...] An array of objects, with the information for each word in the transcription text. Will include the start/end time (in milliseconds) of the word, and the confidence score of the word.
    created "2019-06-27 22:26:47.048512" The timestamp for your request