John Musser
Mar. 21 2006, 11:45PM EST

What's a scrAPI? A scrAPI, which at this point is more of an idea than a thing, was recently described by Thor Muller in his blog as a type of community-built API that provides a programming layer above web sites that don't otherwise have an API. This intermediate layer, which exists independently of the destination web site, in turn does the dirty work of screen-scraping of raw HTML from the source and returns just the relevant data in some cleaner XML format. Thus a collaboratively built and maintained set of code for data access from any source.

It's an interesting idea. Many complications of course. Not the least of which is that many companies object to scraping, be it for reasons of load, stability, or copyright. Good example being Craigslist vs. Oodle.

In this follow-up post Thor notes that the original coiner of the term was Paul Bausch back in 2002. Which in turn was in reference to scraping Amazon data. And interestingly, it was just this sort of scraping that was a key driver in leading Amazon to subsequently build a real API: people are going to do it anyway, let's formalize and leverage it.

John Musser




great point about the potential complications regarding scraping. I have some new essays in the works that address the business and the legal issues around these. There are a tremendous number of data sources that are effectively in the public domain, and a solid framework of best practices would help minimize the potential downsides around load and stability. With community involvement, scrAPIs could be instrumental in supporting equitable terms of use.

While there are plenty of data providers that will want to guard their siloed data, there are many more who don't care or want to make it available in broader form. If we treat the data and its providers with respect, then we can help free it while preserving goodwill.

I would suggest that we respect the wishes of data providers that don't want to open up their systems. No reason to fight those battles yet.

There is an exception--when the data is public domain but relentlessly siloed by government agencies. In these cases I think we have the right--perhaps duty--to free it for the benefit of all.

We just may see instances of civil disobedience via scrAPIs before long.

[...] scrAPI’s - replacing individually-maintained screen scrapers with community-maintained screen scrapers and building API’s around them. Another interesting concept to keep an eye on, including the possibility of increased legal issues [...]

[...] I was wondering how easy it would be to build a generic approach to opening up API’s on web sites who didnt formally publish them and then last night I saw this post about scrAPI’s.    Great stuff—would like to be able to cut and paste data sources and mix them together myself.   I find myself doing manually today too often (eg: the other night I was cutting and pasting rotten tomatoes reviews vs. a movie database).   So many mashup’s today and based on geo location data—its like my one year old who has six or seven words, most everything is at some point “hot”.    Latitude and longitude are just the easiest and first data source to be mined—things are going to get a lot more interesting as the data sources become increasingly diverse.  I look forward to Muller’s coming posts on the business and legal issues regarding scpAPIng. Posted by John Filed in think, building blocks, API's [...]

[...] The very interesting Dapper service officially launched yesterday. It is designed to allow anyone, including non-coders, to create an API for any web site (akin to earlier discussions here about Scrapis). You can use their GUI or use an SDK. Sample services include Magg a movie aggregator an Blotter which graphs blogs over time. This service is now listed here at ProgrammableWeb. [...]