Yahoo's YQL Makes the Internet Your Database

John Musser
Apr. 30 2009, 01:50AM EDT

Yahoo has just released a major update to YQL, the Yahoo Query Language platform they first launched late last year as part of their Yahoo Open Strategy. YQL is a SQL-like programming interface to all Yahoo data that can also support non-Yahoo data as well (think of queries that look like: select id from where text='car'). This week's release adds a set of new features called Yahoo Execute which begin putting in place more pieces of a powerful cloud-based development platform.

As background, it helps to understand a bit about YQL. Their blog post announcing this new release gives a good, concise summary:

The Yahoo! Query Language lets you query, filter, and join data across any web data source or service on the web. Using our YQL web service, apps run faster with fewer lines of code and a smaller network footprint. YQL uses a SQL-like language because it is a familiar and intuitive method for developers to access data. YQL treats the entire web as a source of table data, enabling developers to select * from Internet.

Another way to look at it: if you consider Yahoo Pipes as the GUI-centric way to mashup data, YQL exists much closer to the programmer and code level.

As part of YQL, Yahoo also offers something called Open Data Tables (ODT), an extensible architecture for creating structured definitions of all sorts of online data, not just Yahoo data. This lets anyone make data YQL-accessible. There's a growing community repository of these definitions over on Github with about 70 new 'table definitions' available there now for sources ranging from Yahoo to to Zillow.

Now, with this week's update, Yahoo builds on top of ODT via the "Execute" piece of the puzzle. Execute definitions let developers add arbitrary code that will run during the processing of any YQL statement. The code itself is server-side JavaScript with E4X (native XML support). Here's a bit more on how this works:

With Execute, developers now have full control of how the data is fetched into YQL and how it’s presented back to the user. With Open Data Tables, developers can build tables that manipulate, change, and sign the URLs to access almost any protected content, allowing YQL access and combining data across a variety of different authenticated services such as Netflix or Twitter. Developers can call multiple services and data sources within Execute to join and mashup data however they desire, letting Yahoo! do the work rather than their applications. Data can be tweaked and manipulated into an optimal format for applications to consume.

This means developers can start writing server-side code on Yahoo's infrastructure. Large scale joins and data manipulation can occur on Yahoo's cloud, but the results can also can be integrated back as part of a developer's app elsewhere. Or, with the Pipes YQL module, it can be integrated into Yahoo Pipes. For now the service provides read-only access to data and APIs, but it makes sense that they would add write capabilities before long.

As with other Yahoo platform services, as they've recently done with the BOSS search API, YQL may have both free and paid levels of access, with a reasonably high bar before hitting the paid level.

As Yahoo continues to add pieces to their Open Strategy it will be interesting to see how developers begin using these increasingly sophisticated tools.

John Musser



[...] time job, not mention trying to write code across all of them. Yahoo thought so too, and so they created Yahoo Query Language (YQL), which offers a simplified, SQL-like interface across not only Yahoo’s APIs, but third-party [...]


YQL only will be a valuable tool if yahoo decides to release the web service to the public, so that anyone can run a YQL web service. It does not make sense to put all of your data on yahoo's servers. I have plenty of disk space I do not need yahoo! It will make life easier for anyone who want to extract data from any place, not only yahoo servers. Maybe I'll take my time to develop such a thing.