Voice is considered the next great user interface, but the challenge is finding a way to cost effectively enable its use. At the APIcon 2014 conference this week, Alex Lebrun, co-founder and CEO of Wit.AI, showed how developers can make use of an API to add voice-recognition capabilities to their applications.
Advances in speech-recognition algorithms in the last two years make it practical to invoke a cloud service to endow applications with voice-recognition capabilities similar to what Apple provides using Siri, Lebrun says.
As mobile computing devices get smaller and in many instances no longer have keyboards, Lebrun says end users need a new way to input data. The most practical alternative to is to imbue the application with speech-recognition capabilities.
Lebrun isn’t suggesting that every application needs speech recognition. There are plenty of mobile applications for which a touch screen interface is sufficient. But at the same time, Lebrun notes that there is a whole range of new applications involving everything from cooking to driving where the hands of the end user are required to perform another function.
In addition, Lebrun envisions developers using the Wit.AI service in a variety of Internet of Things scenarios, such as issuing verbal commands to thermometers as part of a home automation system.
Lebrun says voice-enabled applications are not yet well suited for noisy environments, so it’s important that developers set some expectations with users before releasing their applications.
In general, Lebrun says it takes about 10 hours to train an application to recognize voice commands. But at a cognitive application level, LeBrun says the quality of experience improves over time as the Wit.AI service learns more inflections and idioms. To accomplish that, the Wit.AI service makes use of four types of speech engines that the service runs in parallel.
Wit.AI uses streaming technology to provide a higher-quality voice experience than Siri, LeBrun says. But in the future, Wit.AI plans to develop a run-time library that will take advantage of the local processing power of the device to improve the quality of the speech-recognition experience. That run-time library will also provide users with an offline capability for processing speech.
Speech recognition has been one of the holy grails of user interface design for almost as long as anyone can remember. Thanks to the conversion of the cloud and mobile computing, along with some advances in the algorithms that enable speech, developers are a step closer to making voice interfaces an everyday part of the end user experience.