In an upcoming talk at APIcon UK, I'll introduce you to Machine Learning through the use of prediction APIs. As machine learning and predictive analytics services become more widely embraced in the business world, predictive APIs are starting to open up. But are predictive APIs right for your organization? And, if so, which ones?
In a previous post on ProgrammableWeb, we saw how Machine Learning and Predictive Analytics services were growing, and we saw a sample of APIs in that space. More and more of these Prediction APIs are opening up. Among the biggest names, Apigee launched Apigee Insights, its Big Data predictive analytics platform, Microsoft launched Azure ML, and Wolfram Research launched their Programming Cloud with some some built-in Machine Learning methods.
When evaluating this class of API, it is useful to have a common set of questions—the answers to which will help determine whether prediction APIs are a good fit for your needs and to steer you toward the best product for your organization.
1. Can I sign up instantly, or do I need to contact sales or get on a private beta waiting list?
The answer to this question will determine how fast you can get up and running with an API. For example, although very similar in their descriptions, BigML and Wise.io are very different in terms of account setup: Anyone can create a free account with BigML within minutes; with Wise.io, on the other hand, you have to fill in a form and contact sales to get started. The latter scenario might suggest that the service would need to be configured to your particular needs before you can use it.
2. Is there a free plan? How much is the lowest-priced option?
Does the API provider display pricing on its website? If you have to contact sales in order to access the API, it’s pretty safe to assume that you will have to pay for its use. Companies that offer free accounts (such as BigML) allow you to bootstrap machine learning by taking as much time as you want to build a first proof of concept—with no monetary commitment. If initial testing is promising, you can continue with the product or move on to something more powerful if need be.
3. Are there any parameters to tune?
Any time you spend worrying about tuning parameters is time you are not spending figuring out how to leverage data to deliver value with predictions--which is what matters most. It’s important to determine how much up-front work an API will require, and then decide whether it’s worth it or not. Ersatz Labs offers an interesting compromise: According to the company, it "intelligently looks at your data and generates a set of parameters that it thinks are most likely to work well," but it can also let you set parameters manually (if you're willing to spend time to see if you can improve on the automatic choice of parameters).
4. Do I have to run anything on my own infrastructure, or does it all happen in the cloud?
Here again, you need to decide where you want to spend your time and energy—with the data or with your infrastructure. There are definitely good options out there no matter what you decide. For example, PreditionIO is a machine learning server that needs an infrastructure to run on, whereas DirectedEdge runs in the cloud.
5. Do I need to provide my own training data?
For this question, we first need to briefly review how prediction APIs work. Prediction APIs allow you to get answers to questions such as: "Is this email spam?" (spam detection) or "Is this customer likely to cancel his subscription?" (churn analysis) or "What's the sentiment of this tweet?" (sentiment analysis). To come up with the answers, predictive APIs need to have a "model" of the relationship between the object they are making a prediction on (such as an email, a customer or a tweet) and the aspect of the object that is to be predicted (such as “spamminess,” churn or sentiment).
There are two main types of prediction APIs: generic and specialized.
- Generic prediction APIs allow users to specify their own data, which is used to build a model that will then be used to make predictions. This is the type of API you would use for churn analysis, where you would submit historical data about your own customers. Google Prediction API is a generic prediction API.
- Specialized prediction APIs allow users to get answers to a specific question. With this type of prediction API, you don't need to provide your own data, and a model already exists to make predictions. Datumbox is a specialized prediction API that deals with text. You can only ask certain specific questions about the input text, with specific answers chosen by Datumbox. For example, you can detect the topic of a document among a list of 12 topics chosen by Datumbox (but not your own list of topics).
6. What type of machine learning problems does the service cover?
The last question in our list aims to determine the capabilities of a generic prediction API and the type of questions/problems that it can be used for. With _classification_ tasks, a predicted answer has to be one out of a fixed set of possible answers (classes). Questions that have numerical answers, such as "What should the price of this house be?" (think Zillow's zestimate) correspond to _regression_ tasks. When what you're asking for is a set of things to show to a user, you're dealing with a _recommendation_ task.
When evaluating prediction APIs, it’s important to think about what kinds of tasks you will need them to perform—now and in the longer term. For example, BigML and Google Prediction API can perform regression and classification tasks. MonkeyLearn and etcML are used for text classification. PredictionIO can be used for recommendation.
WIth the answers to these six questions at the ready, you will be well on your way toward selecting the prediction APIs that make the most sense for what your company is trying to accomplish.