Giving temporary access to private S3 or GCS audio files

    Often times, you have your audio files stored in a private bucket in AWS S3 or Google Cloud Storage. We need to be able to access these files in order to download and transcribe them. Both AWS and Google Cloud have the concept of a "Pre-Signed URL". You can create a Pre-Signed URL for an object (ie, audio file) in your private bucket that generates a token we can use to access the file. You can specify when you want the token to expire.

    This doesn't make the object public. Without the token, the object is still inaccessible.

    For example, this is what a Pre-Signed URL to a file in Google Cloud Storage looks like:

    https://storage.googleapis.com/google-testbucket/testdata.txt?GoogleAccessId=
    1234567890123@developer.gserviceaccount.com&Expires=1331155464&Signature=BCl
    z9e4UA2MRRDX62TPd8sNpUCxVsqUDG3YGPWvPcwN%2BmWBPqwgUYcOSszCPlgWREeF7oPGowkeKk
    7J4WApzkzxERdOQmAdrvshKSzUHg8Jqp1lw9tbiJfE2ExdOOIoJVmGLoDeAGnfzCd4fTsWcLbal9
    sFpqXsQI8IQi1493mw%3D

    Normally, we wouldn't be able to access this URL, but with the Pre-Signed URL, we have temporary access to the file. When you create a Pre-Signed URL, you can explicitly set the expiration time for the URL. We recommend having the URLs expire in 10-15 minutes.

    Check out these guides to learn more about Pre-Signed URLs in AWS and Google Cloud:

    We'll be adding more code samples soon, but for now here is an example of how easy it is to create Pre-Signed URLs for a file in an AWS S3 bucket using boto3 in Python.

    import boto3
    import requests
    
    # Get the service client.
    s3 = boto3.client('s3')
    
    # Generate the URL to get 'key-name' from 'bucket-name'
    url = s3.generate_presigned_url(
        ClientMethod='get_object',
        Params={
            'Bucket': 'bucket-name',
            'Key': 'key-name'
        }
    )
    
    # Use the URL to perform the GET operation. You can use any method you like
    # to send the GET, but we will use requests here to keep things simple.
    response = requests.get(url)