At DataWeek/API World this week in San Francisco, Timothy Tuttle, CEO and founder of Expect Labs, creator of the MindMeld API, argued that businesses need to start thinking about how to incorporate speech-recognition artificial intelligence into apps and business systems. With around 1,000 businesses already using the API, Tuttle is confident that the technology is advanced enough to enable a whole new approach to using speech recognition in the workspace, at home and in the car.
The MindMeld API uses advanced machine learning algorithms to create a contextual knowledge base (the “knowledge graph”) around a specific application use case, and from there, developers integrate via mobile SDKs to enable speech recognition in apps and products.
Tuttle says there are just two steps to working with the MindMeld API:
1. "Create a knowledge base by a crawling a website.” For the demonstration at API World, Tuttle pointed to the Crackle movie database website and began crawling all data from the site. This crawling tool may take hours to build up the contextual database. In many business cases, this would be done against the business’ own database assets — for example, a customer service database with a business’ products, technical information, repair FAQs, etc.
The Knowledge Graph
Most machine learning artificial intelligence needs to start with a knowledge base that provides contextual clues around a subject matter. This is the case for everything from sentiment analysis to speech recognition, so MindMeld has invested in building a sophisticated, automatic way to create that conceptual knowledge map. Such a product could have independent use, and in fact, Tuttle indicates some business users do use the API to organize searches and data directly from the knowledge base in any case.
The problem we set out to solve is to understand what users are saying, and to solve that you have to understand the knowledge base. We couldn’t find one already, so we built one ourselves. We do make direct access to the knowledge graph available through our API. Some customers are accessing the knowledge graph specifically from their API. But that doesn’t hit the big vision that we are trying to reach.
Tuttle is singularly focused on building an API platform that initially makes speech recognition highly accurate, with a medium-term vision of helping speech recognition pre-empt user need based on passive listening.
“Voice will become more embedded in apps and [the Internet of Things] the next three years,” he says, pointing to three key influences:
1. Advances in technology are enabling speech recognition to be as accurate as voice.
2. More and more connected devices will have microphones but not necessarily keyboards.
3. Platform vendors are building speech recognition into the next iteration of all operating systems and browsers, making the infrastructure available to all.
Tuttle sees apps being built across business and for the connected car and connected home, based on two main functions: to provide answers and to perform actions.
To Provide Answers
Responses are displayed on search screens or are tasks that are immediately implemented. For example, speech recognition is being enabled in apps and in the business value chain to allow end users to quickly search for results. In the home. This may translate into being able to see search results displayed while the end user’s hands are busy — looking up an ingredient on an iPad while cooking, for example.
To Perform Actions
The connected home and car markets are advancing rapidly in uptake of the MindMeld API, says Tuttle. He is convinced that as this grows even more, people will not want an end user experience where their objects speak as well:
People don’t want a conversational exchange; they don’t want for their TV or fridge to talk back to them. They just want the answer immediately, and the fastest way to get the answer is to show them: "Find the answer even before I have finished asking for it and show it to me."
Predictive Apps With Passive Listening
Already MindMeld API is seeing high accuracy of its speech-recognition functionality for apps that are built for a specific purpose, with an associated knowledge graph that matches that use case.
Now, MindMeld API is working with a number of business clients on a set of predictive speech-recognition apps that he expects will reach optimal accuracy in the next 12 to 18 months. This takes speech-recognition apps further. Instead of clicking a button to turn on speech recognition or using code words like “OK” or “Siri,” a new generation of apps would be "passively listening" all the time.
Tuttle maps out how these apps will look:
We are seeing some interesting applications coming up for meeting chat lines. Our app only chimes in when it has some interesting data to offer. That’s not happening yet, it is coming up in a year or so. It represents a new use case that doesn’t exist today. It has to be built for a specific application and be something the users want. Up until today, the technology hasn’t been smart enough to know what to ignore, but we are seeing some great responses. In the next 18 months we are expecting to see a few examples that users will love.
This is a request we get all the time from companies, especially from call centers and conference call systems. Anyone who can crack the code on that will solve a pretty important problem. We have been building with some partners and they have seen some dramatic improvement recently; they are almost ready for prime time.
In a call center, for example, customer support are running searches constantly as the customer talks. Even if listening apps save 5% or even 1% of search time, it is a big benefit for the business in efficiencies and customer experience. It is like Google AutoFill — even before the customer finishes asking, product pages or technical help can be displayed to make the interaction more streamlined, more efficient.
MindMeld API has a freemium model so that developers can register for an API key and begin experimenting immediately. A paid plan kicks in based on a number of business factors, including the size of the knowledge graph, the number of applications being built and the number of API calls that can be made.