What Is a PubSubHubbub Push-Styled API and How Does It Work?

When it comes to deploying the push (aka "streaming") architectural style of APIs, there is no single most accepted approach. In fact, there are several different approaches whose technical details vary from one another. One of those approaches—Webhooks—was discussed in Part 1 of this series. In this second part of this series, I'll cover PubSubHubbub, a closely related cousin to Webhooks.

What Is PubSubHubbub?

PubSubHubbubb is an API technology used to publish information on the Internet. The information can take any form: HTML, text, pictures, audio, or any other kind of content you can imagine. The idea behind PubSubHubbub is to push content rather than force clients to poll for it, which is typically how API implementations are designed to work. A setup includes these three elements:

  • Publisher: Creates the information and sends it to a hub.
  • Hub: Distributes the information to subscribers.
  • Subscriber: Accepts the data feeds from the hub.

The original purpose of PubSubHubbub was to find a better way to distribute information than the polling techniques provided by Atom and RSS. Instead of waiting for clients to poll a server for updates, the hub automatically pushes the updates to the subscribers, keeping everyone up to date. Using such a push approach achieves the following goals:

  • Reduces the resources the client/subscriber must use to keep updated.
  • Ensures that the publisher can update clients/subscribers in a timely manner so that updates aren't old on arrival.
  • Disconnects the publisher from the subscriber so that message traffic occurs asynchronously without slowing down either party.

PubSubHubbub relies on the use of topics. A subscriber can request a list of topics from the publisher and then decide which topics to subscribe to through the hub. The topic description includes the hub URL. In order to receive updates, the subscriber must run a Web-accessible server so that the hub can push updates to it using a client-side endpoint called a Webhook (described in Part 1 of this series).

A Webhook is a callback mechanism of the sort found in many applications that allows for asynchronous communication. When a PubSubHubbub-enabled app (the subscriber) places a subscription request, it also provides the connection details for the Webhook and the hub then uses that connection to feed updates to the subscriber. It's all based on a single event--an update by the publisher. Each such update triggers a push. You see PubSubHubbub used in a number of ways:

  • Blogs, such as those powered by WordPress and Blogger
  • News sites, such as those supported by CNN and Fox News
  • Social media, such as MySpace and Medium.com

There are a number of reasons that PubSubHubbub is popular. However, the most common are:

  • It's free
  • It's open source
  • It allows you to continue using polling as an alternative
  • It doesn't break your current setup

How Does PubSubHubbub Work?

When working with PubSubHubbub, each party in the process has a specific role to play at a specific time in the sequence. The sequence can vary some, depending on how the configuration is set up. However, the following diagram shows a basic sequence of events that you see for most setups.

PubSubHubbub: Creating a Subscription

 

Creating a Subscription

Before anything can happen, a publisher must make the world aware that a particular piece of updateable content, called a topic, exists. In order to do this, the publisher communicates with a hub by sending it a form encoded POST with two fields:

hub.mode=publish
hub.url=<URL_OF_FEED>

Now that the hub is aware of the topic, it can accept queries on the part of subscribers. This process occurs in three steps. First, the subscriber polls the publisher for a topic as usual using a protocol such as RSS or Atom. Second, the publisher responds with a specially formatted message that includes two HTTP link headers, which look like this:

Link: <HUB_URL>; rel="hub"
Link: <URL_OF_FEED>; rel="self"

The hub URL is the location of the hub, such as https://pubsubhubbub.superfeedr.com/ or https://pubsubhubbub.appspot.com/. In fact, you can find a number of free PubSubHubbub hubs to serve your needs. It's also possible to provide this information as a single link header or to use HTML links like these:

<link rel="hub" href="HUB_URL">
<link rel="self" href="URL_OF_FEED">

Now that the subscriber has the required information, it can subscribe to a topic using a form encoded POST with three topics similar to this:

hub.mode=subscribe
hub.url=<URL_OF_FEED>
hub.callback=<URL_OF_WEBHOOK>

It's a good idea to use a different callback URL for each topic so that you can do things like monitor performance. In addition, using a separate callback for each feed URL reduces the complexity of writing code to handle new content. When the topic exists on the hub and the post is in the correct format, the hub sends a 202 response to the subscriber.

Of course, the system needs some means of ensuring that someone isn't playing a prank and subscribing to unwanted topics for unwary subscribers. With this in mind, the hub follows up a subscription request with GET request to the Webhook that the subscriber provides. The GET request includes these elements:

hub.mode=subscribe
hub.topic=<URL_OF_FEED>
hub.challenge=<HUB_GENERATED_STRING>
hub.lease_seconds=<TIME_BEFORE_SUBSCRIPTION_EXPIRES>

In order to keep the subscription action, the subscriber must respond within the hub.lease_seconds timeframe using the hub.challenge string and a 200 response. If the subscription isn't wanted, then the subscriber instead sends a 404 response with an empty message.

Updating a Subscription

After a subscriber successfully subscribes to a topic, the hub automatically pushes content to the Webhook each time the publisher indicates that new content is available. The following diagram shows the sequence of events that occur when new content becomes available.

PubSubHubbub: Updating a Subscription

In order to make the hub aware of new content, the publisher sends the hub a message like the one used to create the topic on the hub originally ; the hub.mode=publish and hub.url=<URL_OF_FEED> link headers described earlier. However, in this case, the publisher might want to use an array of URLs (if the hub supports it) to reduce the amount of network traffic. An array of topic links might look like this:

hub.mode=publish
&hub.url[]=<URL_OF_FEED1>
&hub.url[]=<URL_OF_FEED2>

Notice how the links are formatted in this case. You must add a set of square brackets after hub.url and separate the hub.url entries using an ampersand. There is no limit to the size of the array unless the hub defines one. Consequently, making all the required updates using a single call is possible.

John Mueller John Mueller is a freelance author and technical editor. He has writing in his blood, having produced 102 books and more than 600 articles to date. The topics range from networking to artificial intelligence and from database management to heads-down programming. Some of his current books include a book about machine learning, a couple of Python books, and a book about MATLAB. He has also written a Java e-learning kit, a book on HTML5 development with JavaScript, and another on CSS3. His technical editing skills have helped more than 63 authors refine the content of their manuscripts. John has also provided technical editing services to a number of magazines. Be sure to read John’s blog at http://blog.johnmuellerbooks.com/.
 

Comments