Google has announced the general availability of Cloud Text-to-Speech and updates to Cloud Speech-to-Text. The GA release of Google Cloud Text-to-Speech offers access to WaveNet voices beyond English (US). 17 new WaveNet voices are now available, and the GA release supports 13 new languages and variants (in addition to the original US English). Among the newly supported languages and variants are English (GB), English (AU), French (FR and CA), German (DE), Italian (IT) and Spanish (ES).
A beta release of Cloud Text-to-Speech Audio Profiles is available. Developers can use audio profiles to optimize the synthetic speech generated by the Cloud Text-to-Speech API. Speech optimization ensures that playback is as high quality as possible on a variety of hardware such as smartphone speakers, headphones, and laptop speakers. For example, if most users of an application will be listening to the audio on a smartphone, the synthetic speech can be optimized specifically for playback on smartphones.
Earlier this year, Google announced a large overhaul of its Cloud Speech-to-Text product. The company has now announced several new beta updates to the product including the addition of multi-channel recognition, speaker diarization, and language auto-detect. Multi-channel recognition transcribes multiple channels of audio so that it is clear which words were said by which person. Speaker diarization allows the number of speakers to be inputted as an API parameter as well as via machine learning. Each word is tagged with a speaker number, and the attached speaker tags are updated as more data is received. Language auto-detect allows developers to add language codes (max of four) to each query, and the API identifies which language was spoken automatically. The API then returns the transcript of the audio in that language.