You are here

How To Use Google's Cloud Speech API to Transcribe a Large Audio File

Here at ProgrammableWeb, we’re always on the lookout for great “Hello World” API tutorials; the ones that just about anybody with any level of expertise can complete such that they get to experience the process and the benefits of accessing APIs.

Our latest discovery is a wonderful walkthrough for beginning and experienced Python developers who are interested in using Google’s Cloud Speech API  to transcribe an audio file. In fact, we like it so much, we’re going to try it out ourselves to help us transcribe the audio files that are the source of ProgrammableWeb’s podcasts (an experimental offering in 2017 that’s soon to resume now that 2018 is underway).  

As the author Alex Kras points out, one of the limitations of the Google's Cloud Speech API is that it will only work with audio segments that are 60 seconds or less in length. But Kras easily surmounts that challenge by taking a longer file and using ffmpeg to break it up into a string of smaller chunks that his Python script can cycle through, one at a time. When his script is done, the outcomes of each chunk’s processing are re-assembled into one master transcription.

His script relies on a third party Python Speech Recognition SDK which is a multiplexing SDK that can work with a variety of speech-to-text APIs out there (ie: those from Google, IBM, Microsoft Bing, etc.). The developer gets to pick which one as evidenced by line 19 in his Source Code (at the time we published this) which calls the Google API with the following:

text = r.recognize_google_cloud(audio, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS)

To call Microsoft Bing's speech-to-text API, would be edited to say the following:

text = r.recognize_bing(audio, key=BING_KEY)

One could imagine using the SDK to run a bake-off between the supported APIs using the same audio files.

But wait, there’s more. As an extra little gift, Kras teach us how to speed-up the recognition process by relying on Python’s threading capabilities; a nice extra for those of you interested in threading your API consumption.

Be sure to read the next Voice article: How to Accept PCI-Compliant Payments Over the Phone with Stripe and Twilio

Original Article

Transcribing an Audio File to Text with Google Cloud Speech API and Python