Real-time Data Delivery: HTTP Streaming Versus PubSubHubbub

Phil Leggetter
Jan. 06 2011, 05:30PM EST

There are a number of ways of delivering data in real-time but until recently it has looked like PubSubHubbub, with the backing of Google, was going to be the preferred method. However, the past couple of weeks have seen a couple of interesting developments which could indicate that the developer community may actually prefer HTTP Streaming.

The emergence of the real-time web has seen an increase in the visibility of technologies that facilitate the delivery of data in real-time. Twitter was most probably the catalyst for this due to the many high profile cases where Twitter has been able to deliver the news before any other traditional news medium; the Hudson river plane crash is probably the best example of this. Some of the real-time technologies include PubSubHubbub, RSSCloud, Comet, XMPP, MQTT, Adobe LiveCycle, Google Wave Protocol, WebHooks, WebSockets and HTTP Streaming to name but a few.

We've also seen an increase in the number of real-time services over the past year who have used these technologies. Services such as Beacon, DataSift, Google Buzz, Kwwika (disclosure: author is a founder), notify.me, PubNub, Pusher, Superfeedr and of course Twitter. You can also find a number of other real-time APIs in our directory.

HTTP Streaming has been generally associated with Ajax in the past. In fact the Wikipedia entry for HTTP Streaming (under the Push Technology page and listed as HTTP server push) talks only about "sending data from a web server to a web browser." This is out of date and HTTP Streaming is now much more than this. HTTP Streaming takes advantage of the fact that the Internet infrastructure has been built with HTTP in mind (as does PubSubHubbub). HTTP is fully supported so as well as using this protocol to distribute your static content such as HTML, images, CSS and JavaScript why not use it to distribute real-time data as well. The part of the Wikipedia definition for HTTP Streaming that is correct is:

Generally the web server does not terminate a connection after response data has been served to a client. The web server leaves the connection open such that if an event is received, it can immediately be sent to one or multiple clients.

A client in this context doesn't have to be a web browser. It can be another web server, a desktop app, a mobile phone app, an embedded program running on a piece of hardware, a web application; basically any web enabled device capable of making a persistent HTTP connection.

This might be why services such as Superfeedr, who consistently champions PubSubHubbub, have introduced support for HTTP Streaming and why new services like DataSift has provided support from almost day one.

So, why are services starting to offer HTTP Streaming? The first thing you may think is that a persistent HTTP connection might be a faster way of receiving data than PubSubHubbub and it's intermittent HTTP Push requests. Surprisingly this isn't supposed to be the case since "HTTP 1.1 reuses TCP connections by default" as I recently found out.

One thing that PubSubHubbub does require is that the push notifications have to be made to a web server. This means that PubSubHubbub is highly unlikely to be used for real-time client data delivery because client applications don't tend to run their own web server. Therefore HTTP Streaming is a more accessible real-time data delivery mechanism since any technology that can make a web request, and hold a persistent HTTP connection, can receive real-time push notifications. This means that by offering a HTTP Streaming API a service can be consumed by anything from a hardware embedded system to a mobile application as long as they are connected to the Internet.

The other thing that PubSubHubbub does is define the message format. This can be seen as a positive and a negative but since we are seeing JSON continuing to win over XML as the preferred data format it looks like PubSubHubbub will have to evolve away from XML to keep up as this question on Quora suggests.

This is an exciting trend which will most probably continue and will lead to us seeing truly real-time applications on any web-enabled device. It certainly doesn't signal the end of the road for PubSubHubbub, which has its roots firmly in RSS (and XML), along with so much of the Internet. However, HTTP Streaming could become defacto standard for client push applications.

Photo via Blake Patterson

Phil Leggetter Developer Evangelist at Pusher, Real-Time Web Software & Technology Evangelist, team leader, product developer, micropreneur, managing director of a real-time web and social media software company, blogger and twitter user (@leggetter).

Comments

Comments(14)

@Julien - I understand the reuse of TCP connections in HTTP 1.1 that is of benefit to PubSubHubbub. The guy working on Google App Engine put me straight on that one (http://www.onebigfluke.com/2010/09/common-misconception-explained-by-phi...).

What I'm asking in my first question is: how do you get the arbitrary content from the PubSubHubbub push notification?

For example, the Kwwika-Superfeedr demo I created (http://blog.programmableweb.com/2010/08/26/real-time-news-reader-shows-o...), and which received XML content, got the body of the push request using:

https://gist.github.com/769688

So, will this exact same code work but the document contents that I get will be the arbitrary content? e.g. JSON, text (maybe even a page fragment).

julien

Even though the PubSubHubbub protocol is still mentionning feeds as the data format, it works awesomely well for any type of abritrary content and many propositions were made in the past :) Superfeedr for example, is able to push any kind of content via PubSubHubbub.

Bruce

How do you deal with the cost of open connections on the server? If the connections are persistent and long-lived, you could have many thousands of clients simultaneously connected to a single server - I was under the impression that web servers didn't cope well with that kind of load.

@Julien - I saw your post about arbitrary content support back in December 2010 (http://blog.superfeedr.com/arbitrary-content-pubsubhubbub/). Are involved in trying to get this type supported added to the PubSubHubbub specification? (http://pubsubhubbub.googlecode.com/svn/trunk/pubsubhubbub-core-0.3.html#...)

To clarify: In simple terms does this mean that the body of the HTTP Push that you send to the PubSubHubbub subscriber is the arbitrary content rather than XML?

@Bruce - Many servers that support HTTP Streaming servers have been designed and built from the ground up to deal with thousands of persistent connections. A good example of this is Comet servers. Ultimately resource is finite so at some point you will need to scale horizontally to meet demand in exactly the same way you do with a normal web server but a servers specifically designed and built for HTTP streaming will hit the point where scaling is required much later than a standard web server (Apache, IIS etc.).

Check out the transports section of this Comet server guide on Comet Daily and look at the "Streaming" transport types:

http://cometdaily.com/maturity.html

julien

@Phil : it will be posted to you in the BODY of an HTTP POST... exactly like with XML. The only difference is that the full content will be posted because there is no good way to define a "diff" on arbitrary content... Feeds are better for this, as you can extract only the new entries...

julien

Phil, Comet is different. We're talking about re-using the TCP connections, not the HTTP elements. Apache for example, as keep alives (without supporting anything near Comet :))

Also, yes, I'm pretty sure the PubSubHubbub spec will very soon include support for any arbitrary content.

It truly is an exciting trend. However, I hope it will one day go towards a peer-to-peer WWW in which both clients and servers have URIs. As you say "One thing that PubSubHubbub does require is that the push notifications have to be made to a web server." -- this is only because notifications are made to a HTTP URI. If you could obtain one on the client side - you could use PSHB for the *whole* WWW. And it would make sense -- HTTP works at Web scale already, we just need to use it in both directions (client->server and server->client).

There has already been lots of work on this idea, and can basically be divided into two camps:

a) browser extensions -- projects like BrowserSocket (http://browsersocket.org/) and BrowserAccess (https://github.com/tomek22/BrowserAccess) extend the browser with HTTP and WS servers and offer an API for attaching handlers.

b) reverse HTTP proxies -- projects like ReverseHTTP (http://reversehttp.net/), Yaler (http://yaler.org/) and node-revhttpws (https://github.com/izuzak/node-revhttpws, disclaimer: this one is mine) offer server-side proxies to which clients (browsers) connect using COMET/WebSockets in order to obtain an URI and process requests received on that URI.

@Julien - thanks for the clarification.

@Ivan - thanks for your thoughts and all the additional information. Nokia have actually done some work on creating a mobile web server (http://betalabs.nokia.com/apps/mobile-web-server) and any laptop or desktop can already easily run a web server. Again, you would require an HTTP URI (which can be an IP address) so maybe this is one solution. Maybe we just need better real-time allocation of IPs and true real-time propagation of DNS entries. So, maybe web servers could also be a solution after all?

[...] Although not a real-time protocol these are examples of using a HTTP Streaming API to receive instant notifications. HTTP Streaming seems to be becoming the API technology of choice when the speed of notifications really matters and I think we are going to see a lot more APIs offer this. I wrote an article recently on Programmable Web that covers this topic a bit further and discusses HTTP Streaming verses PubSubHubbub. [...]