ServiceWorker Caching Solution Holds Great Promise

Marcello La Rocca, Data Visualization Engineer, SwiftIQ
May. 19 2014, 12:24PM EDT

The ServiceWorker specification allows scripts to cache resources and handle all resource requests for an application even when a network isn't available. To put it in a different way, the ServiceWorker browser feature enables developers to build applications that work offline, overcoming the limits of HTML5 Application Cache (AppCache) by giving developers a set of primitives to handle every HTTP call on a Website, as well as controlling installation and updates for the cache rules.

In this technical overview we examine ServiceWorker’s strengths and its limitations, as well as the best scenarios to use it in. At the time of publication, the first draft of ServiceWorker had just been released by the W3C. Chrome supports part of the specification as an experimental feature (which can be activated from chrome://flags/), and Firefox support is on the way. Browsers that don’t support the ServiceWorker spec yet will just experience a “graceful failure” when a script tries to set ServiceWorker for a page; this will not affect the loading of the page itself.

Caching is persistent (meaning that it will be stored on disk), but it is not clear yet how big each application cache can be (size will probably be browser-dependent), nor which replacement policies will be used, if any.

ProgrammableWeb will continue to follow ServiceWorker as the spec matures and is more widely adopted.

Do We Really Need Yet Another Cache?
You may be asking yourself, do we really need another cache, or another level of caching? The short answer is "yes." After all, on server side we have application caches, reverse proxy caches, object caches and CDNs; on the Web side we have caches on browsers and intermediate Web nodes; and on the client side we have localStorage and (lately) AppCache.

It likely goes without saying that caching has a tremendous impact on the scalability of Web sites/applications. Indeed, you should try and aggressively cache at any level, as far in the future as possible, because every request served by the cache means easing the load on your servers and databases.

AppCache was designed to complement HTTP cache, not replace it. AppCache promised to relieve servers and networks even further from the burden of (mainly) static content; unfortunately, AppCache has not lived up to its promise. ServiceWorker tries to make up for AppCache limitations, providing low-level primitives to handle requests in a fine-grained way. This directly translates into lower latency, the need for less bandwidth, and less work for servers and CDNs (which, in turn, means less money out of your pocket, since you will likely be paying for these services according to usage).

In addition to the advantages that ServiceWorker provides while browsing a site online, it can also be used to easily build the offline version of your Website, with a simple, straightforward model that gives developers fine-grained and explicit control over what resources to cache, when to cache them and how to cache them. This last feature in particular promises to help in reducing the gap with mobile applications, which intensively exploit offline work and background synchronization to provide better user experience.

ServiceWorker Defined
A ServiceWorker is a piece of JavaScript that can listen for network events (usually, but not always, a request for a resource) and manage content caches, deciding what content will be displayed to the user when a URL is requested. This gives developers the ability to run JavaScript before a page loads or even exists, and this in turn translates into the possibility of having a piece of script, the ServiceWorker, acting like an in-browser HTTP proxy for navigation and resource requests.

ServiceWorkers run in separate threads, like Web Workers--in particular, a new context like that of Shared Workers will be created. (You can find more details on this here.)

One difference between Shared Workers and ServiceWorkers is that ServiceWorkers won’t have access to asynchronous APIs. ServiceWorkers don't have access to the DOM, there is no document nor window object available, and they are prevented from accessing synchronous XHR.

The infrastructure and working mechanism of ServiceWorkers are very similar to that of Web Workers; the main differences are that events for ServiceWorkers will be triggered by the browser only (and not by the document), that ServiceWorkers will have peculiar rules for binding and uploading, some extra limitations (see below), and some syntactic differences for declaring the event handlers inside the workers.

Perhaps most important of all, if a ServiceWorker is installed for a page, it is the ServiceWorker that controls the page, not the other way around.

ServiceWorkers, however, are installed by Web pages: The only way to install a ServiceWorker is for a user to visit a page or an app. ServiceWorkers won’t kick in until the next time the user visits that page; once it does, resources for the page (like images in the body) are requested from the installed ServiceWorker before the normal browser cache is consulted for the requested resources.

One point to stress is that documents will maintain the ServiceWorker they start with for their whole lives. This means that if a document is opened without a ServiceWorker, it won't suddenly get a ServiceWorker later in life, even if one is installed for a matching bit of URL space during the document's lifetime. The browser will automatically send a request for updated versions of the ServiceWorker roughly once every 24 hours. The new version of the ServiceWorker, however, will only be served after all pages using the old version unload. (If you have only one page open, a simple refresh will be enough, but in both cases you can force this update using the event.replace() method inside oninstall.)

How ServiceWorkers Work
To put it in simple terms, ServiceWorkers allow developers to register a fetch method that intercepts navigation and resource requests; the handlers (possibly more than one) defined for this event takes a parameter that provides methods and fields to access the request and to send a response to the page.

If no response is sent by the ServiceWorker, then the request takes the usual path (regular navigation is the fallback for request not handled by ServiceWorkers) and HTTP cache is checked. Otherwise, the content sent by the ServiceWorker is returned to the page.
ServiceWorkers also provide a cache system that can be used together with the ServiceWorker to provide even more powerful solutions.

For any request, you can decide if it is better trying to serve the files from the network first, instead of being stuck with the cached version. (This flexibility solves most of the problems seen in AppCache.) You will even be able to use Background Synchronization, thanks to another new feature still in development (explained here) and sync your files at every chance you get, while serving cached content when offline.

But What About AppCache?
Forget about Application Cache: The ServiceWorker spec is an alternative to AppCache, not a complement. And, no, we don’t “have” AppCache, because it turned out to have far more too many limitations to be really useful.

AppCache is probably easier to use, because developers just have to create a manifest file with the resources to cache, and the low-level stuff is all hidden by the browser. Unfortunately, its declarative nature is the main reason why AppCache lacks the flexibility that would allow it to work properly.

For instance, with AppCache files are always served from the application cache, even if you are online. The browser won’t check for updates, unless the manifest has changed in the meantime. Even if it has, if your server omits cache instructions for the resources served, the browser will assume the version stored in AppCache is still good, and serve it.

Even worse, if the manifest itself is far-future cached (with a header like Expires "Wed, 15 Apr 2020 20:00:00 GMT") the version in the Web cache will be considered valid and the manifest will never be updated. With the ServiceWorker spec, a ServiceWorker will be updated at most every 24 hours. It’s not clear yet if cached content will be refreshed as well during refetch, but you can choose to serve online content first, for all your resources or for a subset of them.

A more troubling example involves redirects: AppCache will consider off-origin redirects as failures, so if you need to use OAuth for authentication and redirect to Twitter, Github, Google, and so on, you will be shown a fallback page, if you have set one, or an error page otherwise. There is no workaround for this, except using meta-redirects or JavaScript (or a combination of the two, to have it working on pages with JavaScript disabled).

ServiceWorker not only supports redirects in the Response object (by setting its Location field), but also provides a shortcut (forwardTo()) that can be used in the onfetch listener. With ServiceWorker, redirects behave exactly as if the server had responded with a redirect.

Last, but not least, with AppCache you must list everything you want to cache in your AppCache manifest once and for all. This means that the first time a user navigates your site, his or her browser will have to download all of that stuff before rendering the page. There is a trick around this,  but the result will be a similar disaster. In both cases, the pages cached will be frozen until the manifest is changed, and every time such a change is made, all of the pages have to be downloaded again. (Think what this would mean with a huge site like Wikipedia.)

ServiceWorker Characteristics

--Installation
Only pages served over SSL can install ServiceWorkers. This is to avoid man-in-the-middle attacks, to which HTTP is vulnerable. To avoid giving attackers more power, ServiceWorkers are HTTPS-only. Once a ServiceWorker is installed on your page, in fact, it completely controls what resources it downloads. Registration is persistent over browser sessions (it will survive restarting the browser), and ServiceWorkers can only be unregistered explicitly using navigator.serviceWorker.unregister().

Further, ServiceWorkers can only be registered on the same domain as the page requiring them. The reason is the same as we stated earlier. If an attacker could run some JavaScript code (thanks to a cross-site script attack, for example), then he or she could also request that a new ServiceWorker to be installed. Without this restriction, the ServiceWorker could come from any different origin--including the attacker website. The attacker’s ServiceWorker could then prevent updates to content that might remove it, and the original application wouldn't be able to help the users who have suffered these attacks.

With the same-origin restrictions, it's possible for an attacked application to help attacked users because their browsers will request ServiceWorker updates from the source origin no less frequently than once a day. Thus, even cross-site scritpting attacks can be blocked within a day.
This restriction, of course, will have some repercussions. For example, it is not possible to serve ServiceWorkers from CDNs. It is possible, however, to use the importScripts() method to include other resources that are served by CDNs. Imported scripts will be downloaded and cached alongside with the ServiceWorker. This, for instance, allows ServiceWorkers to import libraries from other origins.

If for any reason (exceptions, syntax errors, a failure setting caches) the code inside the oninstall handler fails or the promise passed to the event.waitUntil() method (which ensures that the oninstall handler returns only after everything is set as planned) is rejected, then the ServiceWorker will not be considered installed and it won’t control the page in successive requests.

--External resources
ServiceWorkers, along with all the scripts imported using importScripts() the first time a ServiceWorker is run, will be persistently cached using a different policy with respect to normal resources. This ensures that the ServiceWorker and all the libraries and scripts it depends on will be available even offline.

Scripts will be automatically updated every 24 hours, and when the browser re-fetches the main script, it ignores HTTP heuristic caching and goes all the way to the network, requesting the ServiceWorker script directly from the server and bypassing Web caches.

Scripts required by importScripts() are fetched and validated in the same way, at the same time.
It is possible to request resources (such as images, videos and  scripts) from different origins, and it is even possible to issue XMLHttpRequests for many kind of off-domain  resources. However, they must provide a valid CORS (Cross-Origin Resource Sharing) header, for HTTP Access Control, and the body of the responses can’t be accessed in the ServiceWorker. It can be passed along in the ServiceWorker response and can also be cached, but can’t be modified. Also, responses to requests for off-origin resources can’t be created from scratch. (A different type is used for this purpose.)

--Sharing
ServiceWorkers are shared by all the pages on the same origin as long as they are invoked with the same name, but different pages on the same domain can use different ServiceWorkers.
When registering a ServiceWorker, the scope of the ServiceWorker can be passed as the second parameter to the register() method; the scope rule will be applied for all navigation (that is, top level, browsing a page or loading an iframe) requests, and the longest prefix in the rules for the domain will match the navigation request.

For resource requests (like loading a script inside a page), the ServiceWorker that handles the request is the same one that controls the page issuing the request. In case of multiple registration, last-registration wins. (It acts like a replacement.)

The Cache, however, will not be shared by different ServiceWorkers; each ServiceWorker has its own exclusive caches object.

--Global state
Global state should be avoided in ServiceWorker. Think of a ServiceWorker as a server application, only running closer to you. They are shared resources that can be started and stopped by the browser at any time, and the browser can even stop a running ServiceWorker and re-issue the same request to another instance. So, you can’t assume your document is talking to a single instance of the ServiceWorker.

As for servers, you should instead serialize state in a database (or an equivalent solution). IndexedDB is an excellent tool for this purpose.

Conclusion
The ServiceWorker spec shows a lot of promise, especially in terms of closing the gap with mobile applications. As a developer, I’m looking forward to writing some working code and testing. I hope you share the same excitement--you should. We are used to almost weekly new libraries and features to improve calculus performance or 3D rendering on JaveScript engines and browsers. Most of the time, however, they are improvements that will greatly benefit users, but impact only a small subset of people in the developer world. ServiceWorker, in contrast, promises to change the way we all develop Web applications. The impact will be significant.

Here are a few useful resources to expand on the topic:

Marcello La Rocca As a developer I'm focusing on JavaScript, Python and Java (Android), but I have a weakness for algorithms. Lately, apparently I also became a tech blogger! My personal blog: mlarocca.github.io - Follow me on Google+

Comments

User HTML

  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.