Transcribing dual channel/stereo recordings

    If you have a dual channel audio file, for example a phone call recording with the agent on one channel and the customer on the other, the API supports transcribing each channel separately. All you have to do is add an additional parameter when submitting your transcription job.

    Making the Request

    When making your request to /v2/transcript, all you need to do is include the "dual_channel" param. The API will transcribe each channel separately, and label each word in the transcript as channel 1 or 2.

    curl --request POST \
      --url https://api.assemblyai.com/v2/transcript \
      --header 'authorization: YOUR-API-TOKEN' \
      --header 'content-type: application/json' \
      --data '
        {
            "audio_url": "https://www.assemblyai.com/static/media/phone_demo_clip_1.wav",
            "dual_channel": true
        }'
    Dual channel transcriptions take ~25% longer to complete than normal, since we need to transcribe each channel which adds a little extra overhead!

    Viewing the Response

    Once your transcription is complete, you can GET the result like normal:

    curl --request GET \
      --url https://api.assemblyai.com/v2/transcript/5552837-2103-4429-a5f2-71c54a272083 \
      --header 'authorization: YOUR-API-TOKEN'

    This will return the usual JSON response, but you should take note for a few special keys that are only returned when transcribing dual channel files. The "utterances" key will contain a list of turn-by-turn utterances, as they appeared in the audio recording. Each object in the "utterances" list contains the channel information (this will be either "1" or "2"). Each word in the "words" array will also contain the channel key, so you can easily tell which channel each utterance/word is from.

    {
        "acoustic_model": "assemblyai_default",
        "audio_duration": 150.766167800454,
        "audio_url": "https://www.assemblyai.com/static/media/phone_demo_clip_1.wav",
        "confidence": 0.922175805047867,
        "dual_channel": true,
        "format_text": true,
        "id": "5552830-d8b1-4e60-a2b4-bdfefb3130b3",
        "language_model": "assemblyai_default",
        "punctuate": true,
        "status": "completed",
        "text": "Hi, I'm joy. Hi, I'm sharon. Do you have kids in school. ...",
        "utterances": [
            {
                "channel": "1",
                "confidence": 0.97,
                "end": 1380,
                "speaker": "1",
                "start": 0,
                "text": "Hi, I'm joy.",
                "words": [
                    {
                        "channel": "1",
                        "confidence": 1.0,
                        "end": 320,
                        "speaker": "1",
                        "start": 0,
                        "text": "Hi,"
                    },
                    ...
                ]
            },
            {
                "channel": "2",
                "confidence": 0.94,
                "end": 3260,
                "speaker": "2",
                "start": 0,
                "text": "Hi, I'm sharon.",
                "words": [
                    {
                        "channel": "2",
                        "confidence": 1.0,
                        "end": 480,
                        "speaker": "2",
                        "start": 0,
                        "text": "Hi,"
                    },
                    ...
                ]
            },
            {
                "channel": "1",
                "confidence": 0.94,
                "end": 5420,
                "speaker": "1",
                "start": 2820,
                "text": "Do you have kids in school.",
                "words": [
                    {
                        "channel": "1",
                        "confidence": 1.0,
                        "end": 4300,
                        "speaker": "1",
                        "start": 2820,
                        "text": "Do"
                    },
                    ...
                ]
            },
            {
                "channel": "2",
                "confidence": 0.94,
                "end": 7380,
                "speaker": "2",
                "start": 3600,
                "text": "I have grandchildren in school.",
                "words": [
                    {
                        "channel": "2",
                        "confidence": 1.0,
                        "end": 3680,
                        "speaker": "2",
                        "start": 3600,
                        "text": "I"
                    },
                    ...
                ]
            },
        ],
        "webhook_status_code": null,
        "webhook_url": null,
        "words": [
            {
                "channel": "1",
                "confidence": 1.0,
                "end": 320,
                "speaker": "1",
                "start": 0,
                "text": "Hi,"
            },
            {
                "channel": "2",
                "confidence": 1.0,
                "end": 480,
                "speaker": "2",
                "start": 0,
                "text": "Hi,"
            },
            ...
        ]
    }