When you submit an audio file for transcription, it should complete in 20-30% the audio file's duration. So, a 1 hour file would complete in around 18 minutes. When our infrastructure is congested, and we're scaling to meet demand, processing times can slow down to as much as 75% audio duration, although we rarely see processing speeds this slow.
The one exception is if you are transcribing short audio files under 60 seconds in duration. Anything under 60 seconds takes ~1 minute to complete.
Our synchronous API is best suited for very short audio files, <15 seconds in duration, or for short "utterances" of speech collected via an app like a smart speaker or interactive, voice controlled bot.
Synchronous requests complete in 100-700 milliseconds, depending on the audio duration.
If you need transcripts as you speak, with an open session that you can stream audio data to for as long as you need, for example to transcribe a live event or a phone call in real-time, then you'll want to use the Real-Time WebSocket API.