Word Search

    Search a completed transcript for specific words

    We now have a word search endpoint that will allow a developer to search completed transcripts for a set of keywords! Once a transcript has been completed you can send a GET request to the following endpoint:

    /v2/transcript/{TRANSCRIPT_ID}/word-search?words={WORDS_TO_SEARCH}

    So if I wanted to send a simple request in python it would look like this:

    import requests
    
    endpoint = "https://api.assemblyai.com/v2/transcript/YOUR-TRANSCRIPT-ID-HERE/word-search?words=water,air,earth,fire"
    
    headers = {
        "authorization": "YOUR-API-TOKEN",
    }
    
    response = requests.get(endpoint, headers=headers)
    
    print(response.json())
    

    Reviewing the response

    This request results in the following response with the top-level keys of

    Key Value
    id The id of the transcript
    total_count Equals the total of all matched instances.

    For e.g., word 1 matched 2 times, and word2 matched 3 times, total_count will equal 5
    matches Contains a list/array of all matched words and associated data


    {
        "id": "TRANSCRIPT-ID",
        "total_count": 6,
        "matches": [
            { 
                "text": "air", 
                "count": 5, 
                "timestamps": [[1000,10350], [...], [...]], 
                "indexes": [0, 4, 6, 8, 11]
            },
            {
                "text": "water",
                "count": 1,
                "timestamps": [[2410, 2700]],
                "indexes": [2]
            }
        ]
    }

    Within matches we will see all matched words with the keys of:

    Key Value
    text The word itself
    count The total amount of times the word is in the transcript
    timestamps An array of timestamps structured as [start_time, end_time]
    indexes An array of all index locations for that word within the words array of the completed transcript

    Important considerations

    {
        "id": "TRANSCRIPT-ID",
        "total_count": 0,
        "matches": []
    }