Last November, Google changed that. RAISR, Google's proof-of-concept, neural-network-backed image sampler, is a harbinger of big things to come. It represents a fundamental shift to the visual paradigm of the web, and though it's true impact will not be felt for some time, it marks the point where people in the front-end world should start to get very interested in machine learning.
The quality/size conundrum
Before we get ahead of ourselves, let us first take a look at what Google RAISR is, the problem it solves and how it works.
As you might have guessed, the name RAISR is an acronym. It stands for Rapid and Accurate Image Super Resolution. It is an attempt to solve a problem that has bedeviled online imaging experts for years; how to serve high-quality images without slowing page loads to a crawl.
To understand why this is such a problem, some context is helpful. At a high level, an image file is just a long list of pixel positions and color values. Take a photo, draw a grid over it, assign a value to each cell representing the predominant color found within, and you have a pretty good approximation of how a digital image is made. The photo's resolution is simply the number of cells, or pixels, we can fit into a fixed-size frame. More pixels within a given frame means a higher-fidelity representation of the image, but it also means a much bigger file.
Online, that is a problem. Image data accounts for nearly 70% of the data transmitted by an average website. With more and more pages being served over bandwidth-starved mobile connections, every additional megabyte you can eliminate will greatly affect your performance.
One answer for image data is a process called downsampling. There are a number of ways to do this. The simplest and most illustrative is just to scan across the image, identify four pixel squares, find the average of their colors, and then output a single pixel of that color.
Of course, that same process works in reverse. When you have a tiny image you want to display in a big space, you can upsample it. Here, the simple-and-illustrative method is to create four new pixels for every pixel that are the same color.
You can see the problem here, of course. When you downsample an image, you are tossing away a lot of useful visual information. When you upsample, you are creating a lot of visual information from nothing based on a fairly crude guess. Neither image will look as good after it undergoes this process.
A class of algorithms called "Super Resolution algorithms" aim to address this by being a bit smarter about how information is created when an image is enlarged. Basically, the theory is that many photos are actually self-similar in ways that are useful for improving the accuracy of your guess.
Think of a photo of a car in perspective, for example: the back tire looks like a smaller version of the front tire, so you can compare the two to figure out what an even bigger version of the tire might look like.
Enter Google RAISR
RAISR is a step beyond this. What Google noticed was that the ways that super-resolution upscaled images tended to scale in predictable ways. That is to say, when you compared an upscaled file to a large, native-resolution version of the same image, there were discernable patterns to the ways these two images would differ.
To exploit this, Google took millions of high-resolution images, downscaled them, and then upscaled them using super resolution techniques. It then trained its artificial intelligence to make adjustments to the super-resolution algorithm until the upscaled image was as close to identical to the original image as possible. Over millions of iterations, this process created a machine learning model that is very good at enlarging photos without an apparent loss of quality.
Armed with this well-trained model, Google can intentionally send much lower resolution images across the wire and then use RAISR on the client side to upsample them into a version that looks nearly as good as the original, full-size image. Google claims that through this process it can cut the size of a high resolution image by 75% and cut the amount of bandwidth used by a device by 30%.
It's hard to overstate how big a deal that is. In the imaging science world, if you can improve compression rates by 5% that is considered a great leap forward.
What are the future implications for developers?
RAISR is a pretty revolutionary shift in image compression. I predict it and machine learning algorithms like it are going to become the standard going forward. Yet it will take some time to get there. RAISR itself in not yet widely available, and even Google is only testing it on a small cross-section of consumers on a small number of Android devices. But there are important lessons for developers today:
1. Image compression is going to continue to improve significantly through machine learning. In many ways, images are encoded and transmitted largely according to their color values and visual patterns.This makes intuitive sense. But it might not be the best way for a machine to encode data. In fact, one interesting area of research is in discovering ways to intentionally degrade an image before it undergoes compression such that the errors introduced actually improve the output of the compression algorithm. This is hard to do at scale with traditional techniques, but machine learning is ideal for solving these kinds of problems.
2. We are witnessing the benefits of thinking in terms of near-infinite compute power. For every image compressed and reconstructed using RAISR, billions of images had to be analyzed, processed and trained against to enable it to happen. This would have taken lifetimes to accomplish with the level of computing power available just a few short years ago. Yet computation has become so cheap and available that we can actually approach problems by saying things like "let's start by analyzing every digital image ever created." Being able to think in these terms, like Google actively does, can unlock incredible new opportunities for advancement.
3. Look for machine learning wins in areas outside of the hype. What often captures the most attention in the news are machine learning applications that involve human-centered activities like speech recognition, object recognition, and language translation. Those are fascinating problems. But they are also very hard. We are probably years away from real breakthroughs. In the near term, wins will come from solving things like image compression — well-understood problems in domains where pretty good solutions already exist, but where AI is even better. There are innumerable problems in these areas that could benefit from a machine learning approach.
Google, through its RAISR technology, is demonstrating how machine learning can advance or completely revolutionize entire areas of research, areas that may not have seen dramatic advancement in quite a while.
There are countless engineering problems out there that may be approached anew or under different considerations when leveraging the unique and powerful capabilities of machine learning. We need to keep our eyes open to these opportunities as they will mint the next generation of break out successes in technology.