Gargl Offers Open Source Scraping Solution for Unofficial API Creation

Patricio Robles
Mar. 07 2014, 03:00PM EST

While the number of APIs grows by leaps and bounds every year, only a fraction of websites offer official APIs. Not willing to wait, those who want and need data are increasingly taking matters into their own hands through the creation of unofficial APIs. A new tool, Gargl, gives individuals an open-source option for doing just that.

Using Gargl, it is possible to build a scraper and unofficial API that can be run from a machine of the user's choice without writing a single line of code. Gargl projects consist of three components:

  • Templates, which define the API for a website using JSON.
  • A Recorder, which allows users to record their interactions with a website to produce templates.
  • A Generator, which creates modules for a specified programming language that can consume a template's API.

Currently, Gargl offers a Recorder as a Chrome extension available through the Chrome Web Store. Gargl's Generator, which is a Java application run from the command line, can produce modules for Java, JavaScript and PowerShell.

All of the source code for Gargl's components is available on GitHub. A tutorial and video walk-through showing how Gargl can be used to build an unofficial API for Yahoo in 3 minutes are also available.

Filling a need, raising legal questions

Necessity is the mother of all invention, and Gargl was born of necessity. Its creator, Joe Levy, spends his free time building Windows 8 apps. To increase the utility of these apps, Levy prefers to build apps for existing services.

Many of those existing services, such as Google Voice, OkCupid, and PlentyOfFish, didn't offer official APIs of their own and Levy found himself reverse engineering their code to get at the data he needed. This was time-consuming, "painstaking" work and it inspired Levy to come up with a more efficient solution. Gargl was born.

Gargl, like many scraping solutions, raises interesting and sometimes still unresolved legal questions. Levy highlighted these prominently in releasing Gargl:

Levy's decision to provide such a warning was based on his first-hand experience. While a number of his Windows 8 apps haven't attracted the ire of the companies whose data he's leveraging, Levy was forced to pull one of the apps he built using an unofficial API. Although he acknowledges that his activity could invite lawsuits, and he has formed a company in an effort to help protect himself against them, Levy has found that "most sites don't want to go through the time, trouble, or fees of a lawsuit, so will issue you a Cease and Desist letter first, and if you comply [they] will not press any further action."

This also played a role in motivating Levy to build Gargl. "It sucks to go through all the effort of figuring out a website's unofficial API, building code to use that API, only to be shut down by the site owner. Gargl eases this pain by allowing you to spend much less time on the figuring out and integrating into unofficial APIs part, so if you do get a takedown request, you haven’t wasted nearly as much time and effort as you would have doing the process manually," Levy explains.

The golden era of scrapers

Scraping isn't a new phenomenon, of course. But building a scraper and retrieving the data it scrapes has never been easier.

Commercial tools like Import.io are growing in popularity, and companies like Priceonomics are tapping into the demand companies have for data by building custom scrapers and unofficial APIs for a fee. But free, open-source tools like Gargl have the potential to be the biggest game changers of all.

Levy admits that some of the commercial tools are currently more attractive in certain areas. Kimono, a commercial tool which made headlines recently by building an unofficial API for the Olympic games at Sochi, "is more user friendly than Gargl" but Levy suggests that Gargl has a number of advantages. There's no shared collection of IP addresses for websites to block, for instance, and no one company that could be taken out by a lawsuit. Because the modules that the Gargl generator produces are incorporated into the user's own code, they can be run as frequently as desired, allowing for the creation of unofficial APIs that are truly real-time.

If developers embrace Gargl and a strong ecosystem grows around it, it's not inconceivable that Gargl's polish and ease of use could some day match that of its commercial peers. Gargl has already attracted attention on Hacker News—a YouTube video showing how Gargl is used has more than 10,000 views—and other developers have started contributing to the Gargl project on GitHub. "[Gargl's] goals are too big for me to handle alone, and only through the community at large can it become truly great," Levy says.

Ironically, if Levy has his way, more companies will recognize the wisdom of offering official APIs and the need for Gargl will decrease. "Because of the fact that it is nearly impossible to truly stop a savvy developer from reverse-engineering and using your unofficial API for their own purposes, I think all sites should release public APIs and embrace the fact that others want to integrate into their services to create additional value," Levy told me. "Maybe once these sites see the amazing concepts developers have come up with in integrating into their services, they will recognize the value in creating official APIs for developers to use."

Patricio Robles Follow me on Google+

Comments