Google is on a mission to bring its key algorithms to everyone. It gave us a glimpse of its algorithms behind intelligent image recognition via Google Photos recently. Just when developers were expecting to learn more, it announced TensorFlow, a deep learning platform that is capable of supporting complex models that can drive intelligent software. The latest announcement around Machine Intelligence software, of which many more can be expected, is Google Cloud Vision API.
Google Cloud Vision API takes complex machine learning models (something that TensorFlow can do well) and provides a REST API to understand the content of the image. By content of the image, it provides multiple pieces of information to help identify objects, detect faces, OCR and much more.
Google Cloud Vision API has support for the following features and you could apply them to the image that you are interested in processing:
- Optical Character Recognition that helps identify text in the image, including support for multiple languages.
- Identify Explicit Content and thereby help filter out objectionable content
- Label/Entity Detection which helps to identify the dominant entity/object in the image. This is very useful to classify a large amount of images. This is most likely the engine behind the recent Google Photos feature to identify / classify your images.
- Landmark Detection that not just identifies the landmark but also provides other details like Latitude/Longitude.
- Logo Detection that identifies any well known logos in an image.
- Facial Detection that will point out where the mouth, eyes, nose, etc are along with attributes to identify facial expressions. Google is very specific in stating that they do not support facial recognition and that the information is not saved on their servers.
All of the above is supported via a REST API and the image that needs to be analysed needs to be embedded in the request. There are plans to integrate with Google Cloud Storage in the future.
Check out a video that shown how a Rasperry Pi powered Robot is able to detect facial expressions and respond accordingly. The Robot uses a Python client to access the Vision API to process the images that it captures.
OCR APIs, Face Detection APIs and Feature Detection APIs are not new. However, Google is a heavy weight, which is now taking its mathematical, battle-tested models and releasing it carefully to the public. The range of applications that could add a powerful and accurate Image Detection API are wide ranging and it should be interesting to see how developers turn this into reality. One of the early adopters of this API has been Aerosense, which analyses thousands of images captured by its drone via the API to make sense out of them.
Google Cloud Vision is free in the Limited Preview phase. Pricing has not been announced for the service. For more information, visit the Cloud Vision page and if you have a few use cases that could make use of this powerful library, request access to Limited Preview.