Crawling webpages isn't something most of us are set up to do. That's why 80legs turned it into a service, spidering two billion web pages per day. It launched with only Java support. Now the company has added an API Kit for Python programmer, responding to its users most popular request (our 80legs profile).
The web crawling service starts with a basic account that is free for limited use. There are also subscription options, in addition to paying by the million pages and CPU hour. The fees, of course, are small in comparison to creating your own farm of web crawling servers.
It's interesting--and potentially convenient--that the two languages the service uses are the same two supported by App Engine, Google's cloud-based application hosting service (our Google App Engine API profile). However, CEO Shion Deysarkar said in an email that not many 80legs developers are using App Engine. "We did Java first because we write all of our stuff in Java. Python came second because it was the most requested language," said Deysarker.
Last fall we wrote about the 80legs contest, which launched in preparation for its 80apps store. Developers can sell apps, keeping 100% of the revenue--just paying 80legs for the page crawling and CPU-hours.