Synchronous transcription for short audio files

    Send up to 15 seconds of audio, and receive a transcript in a few hundred milliseconds.

    Audio requirements

    The audio data you send to this endpoint has to be in the following format. You can send the content of a wav file to this endpoint, or raw data read directly from a microphone.

    A tool called SoX can be used to inspect audio files to make sure they're in the above format. You can install SoX with on Mac using brew install sox, or on Ubuntu with apt-get install sox. With the SoX program, you can run soxi /path/to/audio.wav to inspect the format of your audio file, and to make sure it matches the above requirements.
    If your audio doesn't match these requirements, accuracy from the API will be very bad!

    Making the API request

    The /v2/stream endpoint expects a single JSON parameter audio_data, which should contain your raw audio data, base64 encoded. Most programming languages have very simple built-in libraries for encoding binary data to base64.

    API response

    Depending on the duration of your audio, the API will respond within 100-750 milliseconds. The response will look like this:

    {
        "status": "completed", 
        "confidence": 0.97, 
        "created": "2019-06-27 22:26:47.048512", 
        "text": "set the temperature in the office to nine degrees", 
        "words": [
            {
                "text": "set", 
                "confidence": 0.99, 
                "end": 660, 
                "start": 0
            }, 
            {
                "text": "the", 
                "confidence": 0.98, 
                "end": 800, 
                "start": 620
            }, 
            {
                "text": "temperature", 
                "confidence": 0.96, 
                "end": 1340, 
                "start": 800
            }, 
            {
                "text": "in", 
                "confidence": 1.0, 
                "end": 1520, 
                "start": 1320
            }, 
            {
                "text": "the", 
                "confidence": 0.9, 
                "end": 1660, 
                "start": 1500
            }, 
            {
                "text": "office", 
                "confidence": 0.98, 
                "end": 2100, 
                "start": 1640
            }, 
            ...
        ], 
        "id": "86tut5kj-7487-4c1a-94a6-06c2a88fb3bd"
    }

    Boost accuracy for keywords and phrases

    Include the "word_boost": ["foo", "cancel my plan"] parameter in the JSON object you send to the API to boost the likelihood of keywords/phrases important for your application.

    You can also set the parameter "boost_param" to "low" "default" or "high" to control the weight applied to the words/phrases in your "word_boost" list.

    Each string in your array must be between 1-6 words, and your array can only contain up to 150 words/phrases for now. Enabling this feature may have a small impact on latency (which grows the bigger the list is).

    Enable text formatting

    By default, the API will return text in its spoken form, for example one hundred seventy five. To enable text formatting, set the "format_text": true parameter in your API request -- this will cause the above example to be returned as 175.