The amount of data captured for analysis is increasing all the time. Often this data is fed into multiple systems that need to analyze, process, persist or perform other operations with it. It is important that these systems make sure that sensitive data is identified and redacted. At its annual cloud conference, Cloud Next 2017, held recently, Google announced the Data Loss Prevention (DLP) API that does exactly that. It was among a series of announcements made around the company's Identity and Security services.
The DLP API as announced in the blog post, is a classification engine that supports more than 50+ kinds of sensitive information and provides methods to both identify and redact that information. One of the more interesting parts of this API is that it supports both text and image content, which opens up a wide range of possibilities for potential use cases.
The diagram below indicates the potential use of the API, where the redacted data that the DLP API collects can then be fed into multiple other applications as an input source.
The complete API documentation is available here. You can also look at the current list of Entities that it detects. The list is likely to keep growing as the API matures. The API also contains methods to manage jobs that can run across the content found in Google Cloud Repository. A Node.js based command-line tool has been provided to test out the API.
Google has provided a demonstration of the DLP API that aims to help understand what the API is capable of. Try it out over here. At the time of writing, the API supported nearly 54 types of potentially sensitive information and you can choose to identify any of the types by selecting/deselecting the types in the demo.
To get more details on DLP API, please refer to the official documentation along with API Reference. Please keep in mind that the API is currently in Beta and can undergo changes as it moves towards General Availability.