How to Build a Monitoring Application With the Google Cloud Vision API

Have you often wondered about the accuracy of Google Images and thought about how you could incorporate some of that technology in your applications? Google with its years of data and Machine Learning experience, and backed by its infrastructure, has been announcing not just how much of their own applications are utilizing Machine Learning but also opening up their platform for developers to use.

In this article we’ll cover the Google Cloud Vision API, which enables you to give vision capability to your applications backed by Google’s Machine Vision Infrastructure. We will provide a high level overview of the API and its features and then show you how to get started with basic examples that let you exercise the API’s features. There are multiple references in the article that will help you as you go deeper into this API.

Applications that were just a fiction of imagination can be realized today and while the technology is nascent and the results often do not much upto what you would expect, the fundamentals are in place for you to add Machine Vision to your applications.

Cloud Vision API Features

The Google Cloud Vision API has come out of Alpha and is now available in Beta with a basic pricing model in place. The API provides powerful image analysis that can help you perform a number of machine vision activities. We covered the initial announcement of the API late last year and the features have remained consistent since then. They include:

  • Optical Character Recognition that helps identify text in the image, including support for multiple languages.
  • Identify Explicit Content and thereby help filter out objectionable content
  • Label/Entity Detection which helps to identify the dominant entity/object in the image. This is very useful to classify a large amount of images. This is most likely the engine behind the recent Google Photos feature to identify / classify your images.
  • Landmark Detection that not just identifies the landmark but also provides other details like Latitude/Longitude.
  • Logo Detection that identifies any well known logos in an image.
  • ​Facial Detection that will point out where the mouth, eyes, nose, etc are along with attributes to identify facial expressions. Google is very specific in stating that they do not support facial recognition and that the information is not saved on their servers.

Here are a couple of images and the kind of information that you can get from the Google Cloud Vision API:
google cloud vision api informationgoogle cloud vision api information 02

Understanding the API

The Google Cloud Vision API Documentation page gives developers all the information they need to work with the API including  Getting Started Tutorials, API Reference, pricing information and more.

The Vision API is a simple to use REST API that accepts a JSON Payload via POST. The JSON Payload consists of the list of images that you want analyzed and the image features that you want to the API to detect and return information about.

The JSON Request format for the API is shown below:


For every image that you plan to send across to the Cloud Vision API for analysis, you need create the requests object as shown above. It consists of the image , which is a base64 encoded representation of the image bytes and an array of FEATURE_TYPE requests.

As mentioned, these FEATURE_TYPE requests correspond to the capabilities of the Cloud Vision API. For example there is a FEATURE_TYPE for Logo Detection, one for Label Detection and so on.

The table of FEATURE_TYPE and what each one does is reproduced from the official Documentation below:

Feature TypeDescription
LABEL_DETECTIONExecute Image Content Analysis on the entire image and return
TEXT_DETECTIONPerform Optical Character Recognition (OCR) on text within the image
FACE_DETECTIONDetect faces within the image
LANDMARK_DETECTIONDetect geographic landmarks within the image
LOGO_DETECTIONDetect company logos within the image
SAFE_SEARCH_DETECTIONDetermine image safe search properties on the image
IMAGE_PROPERTIESCompute a set of properties about the image (such as the image's dominant colors)

The response from the API is in JSON format and it depends on the Feature Type that you have requested. Check out the details here.

API Security is via API Key or Service Account Key. We have detailed the steps of obtaining the Service Account Key in actual Sample Code that we cover a bit later in the article.

Getting Started

In this section, we will look at a sample project where we are monitoring a physical location and want to know if there was a person found in the space at any point of time. Think of it as a monitoring solution where you are usually not expecting certain kinds of objects at all, or during certain times of the day. This can bring efficiencies to the system by taking hours of data and isolating only those images that meet the criteria and having a smaller subset to look at and analyze.

Be sure to read the next Machine Learning article: An Analysis of Brexit With the MonkeyLearn Machine Learning API