IAB Categorization

    This feature is only enabled for Enterprise accounts! If you are not sure if your account is enabled for Enterprise features, please contact your account manager.

    Using the IAB Classification feature, AssemblyAI can classify your transcription text with up to 5 out of 370 possible content categories. The top level categories are:

    Automotive 
    Books and Literature 
    Business and Finance 
    Careers 
    Education 
    Events and Attractions 
    Family and Relationships 
    Fine Art 
    Food and Drink 
    Healthy Living 
    Hobbies and Interests 
    Home and Garden 
    Medical Health 
    Movies 
    Music and Audio 
    News and Politics 
    Personal Finance 
    Pets 
    Pop Culture 
    Real Estate 
    Religion and Spirituality 
    Science 
    Shopping 
    Sports 
    Style and Fashion 
    Technology and Computing 
    Television 
    Travel 
    Video Gaming 

    Enabling IAB Categorization when submitting files for transcription

    Simply include the iab_categories parameter in your POST request, and set this parameter to true, as shown in the cURL request below.

    curl --request POST \
      --url https://api.assemblyai.com/v2/transcript \
      --header 'authorization: YOUR-API-TOKEN' \
      --header 'content-type: application/json' \
      --data '{"audio_url": "https://app.assemblyai.com/static/media/phone_demo_clip_1.wav", "iab_categories": true}'

    Getting IAB Categories in the response

    Once the transcription is complete, and you make a GET request to /v2/transcript/<id> to receive the transcription, there will be an additional key iab_categories_result in the JSON response, as shown below:

    {
        # some keys have been hidden for readability
        ...
        "id": "tfos1wi4g-9f86-4465-9d7f-3740ae1668d2",
        "status": "completed",
        "text": "..."
        "iab_categories_result": {
            # 'status' will be "unavailable" in the rare chance that IAB results
            # were unavailable for this transcription
            "status": "success", 
            # 'results' contains a list of each paragraph of text in the transcription
            # along with the IAB labels that were predicted for that paragraph of text,
            # the confidence score for each label, and the timestamp for where the paragraph
            # of text occurred in the source audio file
            "results": [
                {
                    "text": "Last year, I showed these two slides that...", 
                    "labels": [
                        {
                            "relevance": 1.0, 
                            "label": "Science>Environment"
                        }
                    ], 
                    "timestamp": {
                        "start": 12350, 
                        "end": 164740
                    }
                }, 
                {
                    "text": "In the Andy's, this glacier is the source...", 
                    "labels": [
                        {
                            "relevance": 1.0, 
                            "label": "BusinessAndFinance>Industries"
                        }, 
                        {
                            "relevance": 0.33, 
                            "label": "NewsAndPolitics>Politics"
                        }, 
                        {
                            "relevance": 0.33, 
                            "label": "Science>Environment"
                        }
                    ], 
                    "timestamp": {
                        "start": 164950, 
                        "end": 319890
                    }
                }, 
            ],
            # for each unique IAB label detected in the 'results' array above,
            # the 'summary' key will show the relevancy for that label across
            # the entire transcription text; for example, if the "Science>Environment"
            # label is detected only 1 time in a 60 minute audio file, the 'summary'
            # key will show a low relevancy score for that label, since the entire
            # transcription was not found to be consistently be about "Science>Environment"        
            "summary": {
                "Science>Environment": 0.865, 
                "BusinessAndFinance>Industries": 1.0, 
                "NewsAndPolitics>Politics": 0.165, 
                "Technology&Computing": 0.15
            }
        }, 
    }
    You can learn more about IAB taxonomy here!.