Google's Cloud Vision API Can Search Individual Video Frames

Google's new Cloud Video Intelligence API makes it possible to scan and search videos for nearly anything thanks to Machine Learning. Google says the tool relies on a simple REST API that can peek at every single frame in videos hosted in Google Cloud Storage. 

Google announced the Cloud Video Intelligence API at its Google Cloud Next conference. Fei-Fei Li, Google's Chief Scientist of Artificial Intelligence and Machine Learning, spoke about a number of new initiatives during his keynote address, but the video API is by far the most impressive. 

The API is able to annotate any video stored on Google's servers at video, shot, and frame-level content. There are three main features: label detection, temporal annotations, and shot change detection. The first lets searchers identify everyday objects, places, and things, such as the Empire State Building or a tank. The second searches across the entire video for things related to the main entities and automatically tags items by location. The last is able to determine when scenes within videos are cut or changed. 

Who is this API for? Google suggests companies that host massive amounts of media will find new ways to take advantage of their content. For example, media organizations sitting on years of archived content (think TV news channels) Can suddenly use the Cloud Video Intelligence API to seek out key subjects and items within each video and tag it. Doing so may bring new life and monetization opportunities to those old videos. 

Publishing platforms can take advantage of similar benefits. Google believes tagging videos with metadata can drive content recommendation engines, expanding the scale of content available for people to discover. Similarly, end users will find lots to like as the Video Intelligence API lets regular Janes and Joes find their favorite people or places in their own online videos. 

Google says the API is powered by the same underlying code that's already baked into the video search tools in YouTube. It takes a new approach, but still relies on deep machine learning techniques. 

For now, the API is being offered in beta form to a limited number of Google partners. Cantemo was given early access to the API and Mikael Wahlberg, VP of product development, said the API lets its users "automatically identify objects, places, and people from within the media content itself, whether video, image, or audio. It will track at exactly what time within the content a specific item appears. This level of information is vastly deeper than anything possible today and will save our customers vast amounts of time and money.

Google's Fei-Fei Li also outlined several new capabilities for the company's Vision API. Chiefly, it can recognize millions of entities from Google's Knowledge Graph and pair it with metadata from Google Image Search. This means it can detect and group similar images. The Vision API is also better at optical character recognition, which allows it to pull text from images and catalog it so it is searchable. 

Be sure to read the next Video article: Agora Launches SDK to Add Filters to Live Video