US Court of Appeals Irreparably Damages API Economy

Last fall, after a lower court ruled that HiQ was within its rights to circumvent LinkedIn’s API program by “scraping" LinkedIn’s Web pages, a US Court of Appeals upheld the ruling and paved the way for irreparable harm to the API economy. What this means for the future of the API economy remains to be seen. But, in upholding the lower court’s decision, the US Court of Appeals not only made it exceedingly difficult for companies to justify (through monetization) the provision of public APIs, it may have stifled organizational interest in making important data available to the public. 

LinkedIn (a subsidiary of Microsoft) was denied its subsequent petition for the Circuit Court to rehear the case and has since asked the Supreme Court to intervene.

Web scraping is a practice where software (often in the form of a bot) running on one system uses web browser technology to access a web site as though it were a human. Then, once the software’s browser has opened that page, it siphons any data it finds back to its own systems. The most sophisticated versions of web scrapers use machine learning to maintain an understanding of a web page’s structure, thereby easily identifying the various fields of data that might be found on a page. As a very rudimentary example, if a product page on a retailer’s web site has a thousand characters on it, a web scraper designed to scrape that page might know that the product name starts at the 350th character and the price can be found at the 475th character.

The ruling didn’t just declare open season for web scraping. It also disallowed any technical countermeasures designed to prevent web scrapers from accomplishing their tasks. For example, while it’s physically impossible for a human to visit 100 web pages in a minute, that sort of scale is child’s play for a good scraper. To prevent scraping, web site operators have been known to set thresholds that limit bots to a minimal quantity of page views akin to what a human might normally consume. Or, the web site operator might block certain inbound IP addresses once they’ve been discovered to be the source of a scraper.

Although the ruling seemed to limit the precedent to “information [already] available to the General public,” there’s a lot of ambiguity around what data is technically public and what is not. Ergo, what is hacking and what is not? 

The potential damage to the API economy and the harm to the efficacy of APIs in general is not to be underestimated. Apart from their potential to be a source of revenue, APIs also represent a technical solution to a very thorny problem. Just like at Christmas time when shopping-related traffic tends to spike many e-commerce sites, a bot working at scale can easily load down a web site as though hundreds or even thousands of humans all showed up at once. Unlike with Christmas however, where web site operators can anticipate the traffic spike and adjust their system capacities accordingly, any number of scrapers can show up at any time to scrape a web site. 

Maybe your site can take the load of one or two scrapers. But what if ten scrapers showed up simultaneously. Chances are your site would crash under the load and not only cause an expensive fire drill for you, but your regular human users would very likely be denied access while that fire drill is taking place. The net result is the same as a deliberate Denial of Service attack where the attackers overwhelm your systems to the point that they’re no longer available to normal users.

APIs, on the other hand, offered a better route to the same data that scrapers were after. It’s like the difference between the grocery store's front door (designed to accommodate average human traffic), and the same store’s loading dock that’s designed for trucks to unload and load in bulk (something the front entrance simply isn’t capable of handling). Like the grocery store’s two entry points however, good governance means that the bulk entry point (the API) isn’t for everybody like the front door is. 

You need some idea of who is showing up and what they’re taking or leaving behind. Good governance typically means the deployment of an API management system that’s capable of issuing easily recognized identification credentials, authenticating users when they arrive, keeping track of what they take or deliver,  and automatically scaling for the type of traffic that’s typical for APIs. One of the best parts of providing this second entry point is that you can limit or even revoke access based on the user’s identity.

Seems reasonable, right?

Well, not anymore. Whereas organizations like LinkedIn would normally reroute the truck drivers (think truckloads of data) to the loading dock (the API), now, the US Court of Appeals has told LinkedIn and all other web site operators that they must let the truck drivers come through the front door, even if that door isn’t designed for that sort of bulk access. Furthermore, the trucks can show up any time they like, 24/7/365, and need not identify themselves, thereby circumventing all that good governance (Who is coming? How often? What are they taking? etc.). 

When a scraper shows up at your web site, you have no idea what data it’s really interested in. Or, what will be done with that data once the scraper has exfiltrated it back to Siberia.  

Of course, if you’re a software developer that’s accustomed to using APIs the way truck drivers like to use loading docks, you’re probably hailing this decision because now, you don’t have to identify yourself, state your intentions, or most importantly, pay for your transactions. You can now legally circumvent the API, go through the front door, and take what you want in bulk for free regardless of the impact it has on the web site operator or the other customers.

One reason the API economy is called “The API Economy” is that API providers like to monetize their APIs. Running a great API isn’t cheap. When you do a good job monetizing your APIs, you not only get to recoup the cost of operating them (mainly for the benefit of the “truck drivers” who want easy access), in some situations, you might even earn a profit. 

But now, with this ruling, that may no longer be the case. This will inevitably force API providers back to the drawing board the same way it must be forcing LinkedIn to rethink its entire business. The grand majority of LinkedIn’s unique selling proposition is dependent on the data it collects and how it organizes that data for access by others. LinkedIn invests millions of dollars to run its web site and recoups that cost by monetizing access to what it has built. Now, however, with last Fall’s decision by the US Court of Appeals, virtually anyone looking to compete with LinkedIn is free to come and take advantage of that investment at no cost. 

It’s hard to know what LinkedIn will do next. Via LinkedIn, I reached out to Microsoft president Brad Smith (to whom I’m connected) for comment. Smith and I go back to the old days (2003) when he and I were guests on the Charlie Rose Show to talk about spam. At the time, I was a journalist for CNET and the founder of an industry initiative called JamSpam. Smith was the General Counsel for Microsoft and for many of the years since, he has been its Chief Legal Officer. However, I have not heard back. I will report back if I do.

Meanwhile, it is very difficult to know how this will play out for the API economy. Fortunately, the efficacy of APIs is not limited to one use case where they are offered publicly and monetized accordingly. Apart from offering them publicly, APIs are the primary enablers of legacy modernization, digital transformation, and game-changing customer experiences. The number of organizations that have successfully offered and/or monetized public APIs is vastly outnumbered by the number of organizations that successfully use them for internal purposes or partnering with other organizations.

Be sure to read the next API Design article: Twitter API Attack Matches Millions of Users’ Phone Numbers to Usernames

 

Comments (4)

Kenneth-Reilly

Having tried to use LinkedIn's APIs in the past, I found that it was a frustrating experience at best and that I couldn't get permissions for even the most rudimentary tasks, including accessing my own data. So, I have no sympathy for LinkedIn whatsoever and I'm convinced that this is exactly what they deserve from raking in millions off a platform that is nearly useless if you're anyone other than a spamming recruiter. LinkedIn is a disappointing service and offers very little to anyother aside from recruiters and the often desperate job-seeking candidates they take advantage of. Well-deserved, LinkedIn.

david_berlind

Kenneth, I agree that there are parts of the LinkedIn user experience that are horrendous. I for one cannot stand how, when you get an email notification that someone has sent you a message via LinkedIn, the email does not contain the content of the message. You must log in to LinkedIn in order to view the message. However, let's not throw the baby out with the bath water as they say.  There will be plenty of other API providers beyond LinkedIn that will be impacted by this decision and so to will their users and customers. The decision is still a bad decision.

ga_

I think you miss an important point here about access. When these companies lock their data into an API, they hold the keys to the gates. This dissencourages competition, access to information, and creates an elitist group of companies that hold and hoard data. When I put on LinkedIn my title and company, does LinkedIn owe that data simply because they are displaying it and hosting it? Do I own it - and if so, how am I rewarded for it? If I decide my profile is public, shouldn't anyone have access? Pages that attract scrapping are pages that lock developers out of API access. If we want to make the tech world less of a monopoly and a place that is more equitable, we need to find ways to make these companies yield their data - especially data that is in all other ways public. If we are worried about who should access, that key should not be held by the company itself, they have a conflict of interest to decide. The scrapper is not the problem here, scrapping is just evidence of a bigger problem: the huge tech monopolies. Blocking scrappers only gives these companies more power. Maybe allowing scrapping is not the solution, but blocking it is definitely worse. 

david_berlind

Hi ga_,

Great job presenting the other side of this argument. Fortunately, the United States has antitrust laws designed to dismantle a monopoly when a single player is stifling competition. Had this case been prosecuted as monopoly -- a single dominant player that was stifling competition -- that would have been different. However, now, with this ruling, it doesn't matter whether you're a big dominant player or a smaller business; either way, the new precedent law forces you to allow scraping in a way that undermines your investment. If anything, the new precedent is actually stifling competition because it discourages startups from entering the market knowing that anything they build can be scraped.