Topic Detection

    Using the IAB Classification feature, AssemblyAI can classify your transcription text with up to 20 out of 698 possible content categories. Below are all of the possible IAB labels and their associated topic.

    Enabling IAB Categorization when submitting files for transcription

    Simply include the iab_categories parameter in your POST request, and set this parameter to true, as shown in the cURL request below.

    curl --request POST \
      --url https://api.assemblyai.com/v2/transcript \
      --header 'authorization: YOUR-API-TOKEN' \
      --header 'content-type: application/json' \
      --data '{"audio_url": "https://app.assemblyai.com/static/media/phone_demo_clip_1.wav", "iab_categories": true}'

    Getting IAB Categories in the response

    Once the transcription is complete, and you make a GET request to /v2/transcript/<id> to receive the transcription, there will be an additional key iab_categories_result in the JSON response, as shown below:

    {
        # some keys have been hidden for readability
        ...
        "id": "tfos1wi4g-9f86-4465-9d7f-3740ae1668d2",
        "status": "completed",
        "text": "..."
        "iab_categories_result": {
            # 'status' will be "unavailable" in the rare chance that IAB results
            # were unavailable for this transcription
            "status": "success", 
            # 'results' contains a list of each paragraph of text in the transcription
            # along with the IAB labels that were predicted for that paragraph of text,
            # the confidence score for each label, and the timestamp for where the paragraph
            # of text occurred in the source audio file
            "results": [
                {
                    "text": "Last year, I showed these two slides that...", 
                    "labels": [
                        {
                            "relevance": 1.0, 
                            "label": "Science>Environment"
                        }
                    ], 
                    "timestamp": {
                        "start": 12350, 
                        "end": 164740
                    }
                }, 
                {
                    "text": "In the Andy's, this glacier is the source...", 
                    "labels": [
                        {
                            "relevance": 1.0, 
                            "label": "BusinessAndFinance>Industries"
                        }, 
                        {
                            "relevance": 0.33, 
                            "label": "NewsAndPolitics>Politics"
                        }, 
                        {
                            "relevance": 0.33, 
                            "label": "Science>Environment"
                        }
                    ], 
                    "timestamp": {
                        "start": 164950, 
                        "end": 319890
                    }
                }, 
            ],
            # for each unique IAB label detected in the 'results' array above,
            # the 'summary' key will show the relevancy for that label across
            # the entire transcription text; for example, if the "Science>Environment"
            # label is detected only 1 time in a 60 minute audio file, the 'summary'
            # key will show a low relevancy score for that label, since the entire
            # transcription was not found to be consistently be about "Science>Environment"        
            "summary": {
                "Science>Environment": 0.865, 
                "BusinessAndFinance>Industries": 1.0, 
                "NewsAndPolitics>Politics": 0.165, 
                "Technology&Computing": 0.15
            }
        }, 
    }
    You can learn more about IAB taxonomy here!.