A few years ago the best computer vision algorithms were the preserve of a few neural net researchers. Now every man and his dog can gain access to the best object detection algorithms in the world through public APIs. Wayne Walls, CTO at Filestack, the file upload service, put four of the best through their paces over at the Filestack blog.
The four APIs in question are: Google Vision, Amazon Rekognition, Microsoft Cognitive Services, and Clarifai. Google’s service is based on its neural net library Tensor Flow and does OCR, face, emotion and object detection. It can even flag inappropriate content. Rekognition does object, face and emotion detection but can’t detect inappropriate content. Microsoft Cognitive Services is a set of 22 APIs. Its computer vision API can do color, face, emotion and celebrity detection. Clarifai is the odd one out, being a startup founded by leading neural net researcher Matt Zeiler. It does very extensive image tagging, everything from weddings to travel and food.
Wayne starts his comparison by getting a high level feature overview. All the APIs are pretty similar: all do image tagging. The only major difference is that unlike Microsoft and Clarifi, Google and Amazon can’t do video tagging. Clarifai distinguishes itself by having the only feedback API.
In terms of price, Amazon and Clarifai are the cheapest at around $1 per 10k images. Amazon also has the best file size limit: 5MB compared with 4MB for the others. Clarifai has the best rate limiting, accepting 30 requests a second, as opposed to 10 requests a second for Google. This advantage is cancelled out however by its performance. The average request with Clarifai took 4.69 seconds, while all the others managed under two seconds. Microsoft was the speediest overall. The average request took only 1.1 seconds.
Ignoring accuracy, Wayne summarises the findings by claiming that Google has the most reliable performance overall, although it doesn’t let you send urls instead of files. Amazon, on the other hand, lets you send very big files if you store them in S3 but it’s performance was not great. Clarifai was the slow coach, but that’s partly mitigated by its generous rate limiting.
In terms of object recognition accuracy, Google was the big winner. Wayne tested them all with a set of images that included a dog, peppers and a logo of Uncle Sam. Amazon and Clarifai were fairly reliable. Microsoft came out last, not even being able to return anything for Uncle Sam. Clarifai had the most diverse responses because it offers the most tags but it wasn’t always as accurate as desired.
Wayne concludes that the big winner is Google and indeed Filestack chose Google Vision for their API. This lead is only like to get bigger with time as Google’s Tensor Flow becomes the library of choice in neural net research.