Microsoft Edge to Support W3C's Speech Synthesis APIs

Microsoft announced that the next Windows 10 update will include Microsoft Edge support of Speech Synthesis APIs defined in the W3C Web Speech API Specification. The set of APIs allow websites to convert text to speech with customized voice and language settings. Developers gain the ability to add and manage text-to-speech features tailored to specific page content. To further enhance developers' control, Microsoft will also support Speech Synthesis Markup Language (SSML).

Microsoft Edge includes four SpeechSynthesis interfaces: SpeechSynthesis, SpeechSynthesisUtterance, SpeechSynthesisEvent, and SpeechSynthesisVoice. SpeechSynthesis provides speech playback control and state. Utterance controls speech content, voice and pronunciation. Event provides state information on a current utterance. Voice sets speech service information.

Microsoft's support of the Speech Synthesis APIs is based on WinRT Windows.Media.SpeechSynthesis APIs. The APIs support most of the W3C Speech Synthesis interfaces. Currently, there are a few W3C details not supported by Microsoft: pitch (varies voice pitch on playback), onmark (indicates that a marked tag has been reached), and onboundary (signals boundaries of spoken words or sentences). While such features are not currently supported, Microsoft is evaluating the features for future releases.

Microsoft published a Speech Synthesis Demo to demonstrate the new speech features. The Demo involves a user typing some random text and exposes parameters such as voice, language, rate and volume that allow the developer to tune the resulting speech. The Demo supports any voice language pack installed in Windows 10. A primary language is installed, but users can add languages via the language instructions to further test.  

Be sure to read the next Text-to-Speech article: Google Overhauls Cloud Speech-to-Text

 

Comments (0)