Redact Personally Identifiable Information (PII) from transcriptions

    Redact PII from transcripts

    Working with audio data that has sensitive information? In the below example, we show you how to request a transcription that has Personally Identifiable Information (PII), such as phone numbers and social security numbers, redacted:

    What info is redacted?

    At this time, only strings of numbers are redacted, for example:

    Single digits, for example the "7" in in "I want 7 hamburgers", will not be redacted

    These numbers will be replaced with # characters in the transcription text. For example, 609-217-5555 will become ###-###-####.

    Redact PII from audio

    When you request a transcription that has PII redacted, you also have an option to request audio redaction. In that case, we will mute the parts of your audio where PII numbers are spoken, and make a downloadable URL available for the redacted audio file.

    Important Considerations

    Submit an audio file for transcription and enable audio redaction

    Get the redacted audio url

    If a webhook_url was provided in your API request, we will send a POST to your webhook_url when the redacted audio is ready. The POST request headers and JSON body will look like this:

    headers
    ---
    content-length: 79
    accept-encoding: gzip, deflate
    accept: */*
    user-agent: python-requests/2.21.0
    content-type: application/json
    
    params
    --
    status: 'redacted_audio_ready'
    redacted_audio_url: 'https://link-to-redacted-audio'
    The redacted_audio_url link is only valid for 30 minutes!

    Retrieving the redact audio URL directly from the API

    If you can't receive a webhook, you can also make a GET request to the following endpoint to retrieve a URL for your redacted audio file:

    https://api.assemblyai.com/v2/transcript/<your transcript id>/redacted-audio

    This will return the following status codes and responses:

    200 status code (successful)

    {
        "status": "redacted_audio_ready",
        "redacted_audio_url": "https://link-to-redacted-audio"
    }

    Please note that the redacted_audio_url link is only accessible for 30 minutes. If you need to access it after this time, you can just hit the endpoint again to get a new link.

    While you can request a new link, the redacted audio file will be purged from our servers after 24 hours. You'll need to make sure to download the file and store it in your own server/S3 bucket/etc within 24 hours.

    202 status code (pending)

    A 202 status code will be returned if audio redaction is still in progress. Depending on the length of the file it can take several minutes after the audio file finishes transcribing for the redacted audio file to be created.

    400 status code

    A 400 will be returned if something is wrong with your request or if the redacted audio file is unavailable. You can read more about how to interpret and handle 400 errors in the docs here.