Google's Cloud Speech API will allow developers to convert audio to text within their own apps. The offering from Google will bring its neural network smarts to apps large and small, and opens up a wide range of interesting new possibilities. It also brings the fight to Nuance Communications' front door.
Google is providing access to the limited preview of the Cloud Speech API through its developer website. Developers can take advantage of the API for free, for now, though presumably Google will start charging for access at some point. The API includes a number of key functions.
The automatic speech recognition is powered by learning, networks computers. Google claims it has unparalleled accuracy, and the learning computers become more accurate over time and more people use the API. At launch, the Speech API recognizes 80 languages with some regional variants. Google didn't say how big its vocabulary is other than to call it "extensive." Nuance's mobile SDKs, by way of comparison, only cover about 40 languages.
The API can capture audio from a microphone or in pre-recorded audio files, such as PCMU, FLAC, and AMR. Voice recordings are sent to Google's servers where they are transcribed into text, which is then streamed back to the app in real time. Google didn't say if or how the API handles voice recognition in an offline environment. The API can recognize spoken language even in noisy environments without hardware or software noise cancellation. Google says developers can set parameters to filter out inappropriate content if so desired. Developers can upload and store audio files. A future release of the API will allow developers to integrate those files with Google Cloud Storage.
The Cloud Speech API accesses the exact same toolset that Google uses for its own speech-recognition and voice-command tools in Google Search, Google Now, and the Google Keyboard. Anyone who's used Google voice search knows how quick and accurate its performance is. Developers can take advantage of this API to not only capture spoken words as text, but add support for voice-based commands.
Given how Google's description of the API's workflow says that the API can accept either real-time speech (to which it will respond with a text stream as the speech is recognized) or a complete audio file, it's possible that the API will be a streaming API instead of RESTful one. If that's the case, it's also possible that it will rely on Google's Pub/Sub streaming API technology. In terms of abstracting the API, Google has yet to indicate if it will also provide SDKs for Android, iOS, or other mobile or server-side platforms.
Nuance has individual SDKs for Android, iOS, and Windows. (Google's API differs from the Alexa Voice Services API from Amazon. That API allows hardware makers to add the Alexa AI to their devices and nothing more.)
Access to the Google Cloud Speech API is limited, but Google didn't say how limited. Developers can sign up to test the preview at no cost.