Priceonomics has launched a new offering that enables developers to crawl and analyze web pages on a large scale. The foundation of the Priceonomics Analysis Engine is a service called Fetch.
In a blog post, the company explained:
Fetch performs HTTP requests through our crawling backend. Our API comes with the built-in ability to route requests through several countries around the world, understand and obey (or disobey, at your discretion) robots.txt, set custom user agents, and normalize encoding to UTF-8 for text content. Fetch is also designed for maximum reliability and built in rate limiting, so you may find it solves a lot of the problems you’ve had on large-scale web crawling projects in the past.
Once a web page is crawled, the Priceonomics Analysis Engine analyzes the data it contains using applications that, for instance, can extract email addresses and phone numbers or retrieve information about where and how much the page has been shared on social media.
Priceonomics calls its platform a "minimum viable product" and says more advanced functionality, such as support for non-GET HTTP requests, will be exposed in the future. Additionally, Priceonomics will add new applications for more advanced data analysis.
Currently, Priceonomics is offering free access to its Analysis Engine. Developers can either use a shared API key that may produce slow results, or sign up for a private API key that is limited to 1,500 requests per day. A commercial plan for high volume users is coming soon.
The scraping economy
Data is the gold of the digital age and scraping is increasingly akin to gold mining. According to Priceonomics, "Tech companies and hedge funds pay us between $2K to $10K per month to crawl web pages, structure the information, and then deliver it to them in analyzed form. This is a pretty significant amount of money because acquiring data is a burning problem for some companies."
In an effort to bring its technology to a broader market, Priceonomics decided to take its technology and offer it via API.
Because data is so valuable and scraping it can be such a challenging task, a growing number of companies are hoping to build big businesses by offering self-serve tools that essentially allow anyone to turn web pages into APIs. Priceonomics is the latest to enter the self-serve space, and it will find itself competing against the likes of existing players such as Import.io and KimonoLabs.
Right now, it looks as if the market is large enough to support multiple companies but as more and more companies come face to face with the fact that their data is being scraped and incorporated into unofficial APIs, it's possible that offerings like Priceonomics' Analysis Engine will eventually have the ironic effect of encouraging companies to build official APIs that they can control and monetize.