There seems to be a couple of common concerns that people have about introducing hypermedia to their APIs. There’s a thread on the api-craft mailing list called “Selling the benefits of hypermedia” where these were voiced. I’d like to address them here:

REST should be simple, and hypermedia only complicates stuff… …it’s much more complicated or complex to write clients to parse and consume links than to read an “id” and add it to an pre-known URI

This is actually not true in the case of hal+json - no parsing is needed since the links are structured around json’s processing model i.e:


Following the above link is actually less complex than having to extract an id and adding it to a pre-known URI template. It also reduces coupling because the client no longer has to maintain the pre-known URI template.

That the paralell frequently established between the success of the web and hypermedia, doesn’t apply to machine API clients since they lack the inteligence that allows people to “navigate” the web and make use of the application controls embedded in representations

I agree with this, to an extent. Machines have no real intuition so some types of affordances (like forms) don’t make a great deal of sense. Having said that, I do think the basic affordance of links (and relations) still presents some significant benefits to a machine API:

Firstly, as touched on above, it reduces coupling between client and server because the client does not need to maintain a knowledge of the server’s URI structure. This can help the server to change its URI structure over time to suit its internal implementation without breaking clients.

Secondly, it makes the API browsable by developers. Browsable because a linking convention establishes a standard way of conveying the next available transitions, and therefore makes the creation of tools for ‘surfing’ across an API possible. See

Thirdly, it makes the API discoverable by developers. With a format like hal+json, custom link relations (i.e. the link identifiers) should be URLs and those URLs should expose the documentation for the given link. This means that the links in the API messages also provide a direct reference to their documentation (since their identifiers refer to their documentation), thus making the documentation part of the API. Combined with browsability; you end up with an API that is fully discoverable by a developer, where they can just hit an entry point to the API, read the documentation for the currently available links in-line, and start exploring outwards from there.

Lastly, links can be used to version portions of an API in a granular way. New/breaking changes to a particular part of an API can be introduced by adding additional link(s) alongside the old. The new/breaking change can diverge from the old path for as long as necessary, and then rejoin the flow further down the line. This allows your API to be delivered in a more iterative style, with many small discrete changes - rather than large changes with an entire API version jump. Smaller deliveries are easier to measure, manage, and result in less overall risk/cost.

In a nutshell: directing your clients around your web API via link relations, rather than having them rely on URL conventions, is a good idea for exactly the same reason that Rails’ routing helpers are a good idea.

For the benefit of those not familiar with what routing helpers are: they’re helper functions for URL paths that Rails provides for use within your web app’s code as an alternative to hard coding those URLs into your links, redirects, tests, etc. Here’s a couple of example usages:

link_to widgets_path

redirect_to dashboard_path

The idea is that by not hard-coding the actual URL, and relying on the indirection of a helper, you make it painless to change your application’s URL structures further down the line.

Given that the pain of hard-coding URLs within the application’s codebase is acknowledged by Rails developers, it seems pretty odd that we are still happy describing our web APIs to the outside world in terms of hard-code-able URL patterns.

By using links, you can describe your web API in terms of link relations and therefore encourage clients not to be hard-coded against any URL structures. This allows you to freely change them e.g. moving entire sections of your app onto a separate domain and clients remaining none-the-wiser. This helps you to scale your app, transparently test out changes to your API, and prevents your API succumbing to URL atrophy.

Hypermedia doesn’t have to be difficult or complicated - and to prove that I came up with a simple media type that adds linking on top of JSON. It’s called application/hal+json, and here’s how you would represent a widgets and dashboard link with it:

  "_links": {
    "widgets": { "href": "..." },
    "dashboard": { "href": "..." }
  ... JSON as usual ...

Instead of hard-coding a URL, clients can now simply traverse over the link:

response = HTTP.get(this._links.widgets.href)

This is all hypermedia is really about; the “control” is in the message itself. Exactly the same principal as HTML, just applied to machine-consumable APIs. If you want to see an example API using hal+json, checkout the following link:

To be clear: there are many other benefits to hypermedia APIs that I have not covered here, but I think the above probably the hardest for someone like DHH to dismiss as “REST wankery”. I am planning on writing about the other benefits some time in the near future.

If you’re interested in exposing hal+json and you develop in ruby, I highly recommend Roar by Nick Sutterer. For other languages, you can checkout a the hal homepage for a list of available libraries.

I’ve said recently that HTML is clunky for APIs, but I didn’t really give it much thought. Since then I have given it some more thought and that led to me creating an HTML variant of HAL to test the waters and find out exactly how awkward it is.

tl;dr - it’s actually not that bad. I’m almost convinced. I wrote some jQuery helpers and a stylesheet to turn it into something human-browsable. It all works ok.

The method

  • Ambiguity in a machine interface is bad. I had to establish some clear and unambiguous constraints.
  • I didn’t want any presentational info targetted at humans to muddy the machine interface, which means that surfacing the selectors in a web browser had to be done via the css3 content property.. I did have to resort to JS in a couple of places.
  • I wanted to produce valid HTML. Didn’t really manage this due to the use of rel=”self” upsetting the validator. I can live with that.

How It Looks

You can see here how it renders in a browser.

Here’s the markup (minus style and script tags):


There were a couple of road-blocks I ran up against here whilst trying to stick to purely html and css:

  • can’t present input names without (implemented JS shim)
  • can represent rel/class of links names as hyperlinks to docs (implemented JS shim)
  • can’t show headers for request to page (TODO: fix by overriding all links and forms to use xhr + pushState)


I put together a rudimentary jQuery API for consuming this HAL-HTML stuff. It’s a bunch of helper methods that can be called on a resource element (there are top level methods on the jQuery object that default to the root resource [i.e. body tag]).

fill_in('input_name', { with: 'value' });

They should be relatively self explanatory, I think the most interesting is the last function ‘fill_in’ which is meant to be called on a control element. It returns the control element so you can chain it, you end up with code that looks like this:

$.getControl('ht:make-reply').fill_in('content', { with: 'zomg awesome hypermedia client' }).submit();

The ‘spec’

The below are the basic rules that describe how to express HAL with HTML.


There are two types of resource in a hal-html document:

  • Root Resource, target of request (represented as body element)
  • Embedded Resource, contained/in-line resources to save request (see Embedded Resources section below for how these are represented.)


Selector Pattern

{resource} > .properties > input[name="{property}"]


body > .properties > input[name="created_at"]
body > .embedded > .post > .properties > input[name="content"]


Selector Pattern

{resource} > .links > a[rel~="{relation}"]


body > .links > a[rel~="author"]
body > .embedded > .post > .links > a[rel~="self"]

Embedded Resources

Selector Pattern

{resource} > .embedded > .{relation}


body > .embedded > .post

Form Controls

Selector Pattern

{resource} > .controls > form[name~="{control_name}"]


body > .controls > form[name~="make-reply"]


This was all relatively simple and, now that I’ve done it and thought about it some more, I’ve come to the conclusion that applying HAL’s information model to HTML results in a pretty clean generic interface for machines that you can easily build tooling around.

I’m still not sure using HTML for an API is a brilliant business descision right now though, it’s a bit too “out there” still. ;)


Render one hypermedia type for your human consumers (HTML) and another hypermedia type for your machine consumers (HAL). Conneg (see HTTP’s Accept, Content-Type, Vary headers) should be relatively easy to leverage for this with a decent development stack.


Clearly the hypermedia API for you application will have parallels with your HTML application, but the interface you are presenting to machines needs to account for their automated behaviour in an entirely different manner. Machines are not adaptable consumers, so the machine interface to your application needs to be carefully managed so that it exposes the minimum ‘attack surface’ against which developers can write coupled client code.

Mixing your human and machine affordances in one HTML interface results in a messy, broad attack surface and is therefore not a good strategy for sustaining the evolvability of your application.

How that might look in practice..

If you use HTML for *both* human and machine interactions , you will actually end up presenting two competing interfaces to the machine clients, e.g:

<a href=”/people/bob” rel=”author”>author</a>

Is it valid to select the link control in a coded client by the text ‘author’ wrapped in the anchor tag or is it only valid to select via the @rel attribute? It’s not immediately apparent from the media type, and that’s one example of the kind of tax you place on consumers by using a muddy’d interface like HTML. What happens if one of your developers decides the page looks better if you change author to Author?

Machine hypermedia requires a much lighter interface than human hypermedia does. Forms sound good in theory but have a much lower cost-benefit when the run-time consumers of the application are automatons. Also, it’s not unlikely that machine clients will be written against your API that ‘cheat’ on the full processing rules for forms, which means when you change @method from POST to GET.. many of those external clients bomb out. This is more likely to happen as the complexity of the affordance goes up.

a few other thoughts..

There’s also a few subjective reasons HTML is considered ‘not very nice’:

HTML is not at all elegant for representing data. Querying the DOM, even with css selectors, feels clunky in comparison to traversing a JSON object. 

I’m not sure the hypermedia affordances of HTML are that rich when it comes to machines, either: for example, there’s no clean way of representing the embedded’ness/containment of another resource’s state with HTML.


We should upgrade WebHooks so that they can be created by submitting a form/template of the callback request to be generated. This will allow users to connect APIs together arbitrarily without having to wait for providers to do the work to make it happen.

The long version..

In case you don’t know what “WebHooks” are, here’s a quote from the official wiki:

The concept of a WebHook is simple. A WebHook is an HTTP callback: an HTTP POST that occurs when something happens; a simple event-notification via HTTP POST.

A web application implementing WebHooks will POST a message to a URL when certain things happen.

WebHooks are great - they’re being used in a lot of places on the web, but they aren’t achieving their full potential. They haven’t had as big an impact on the web as they probably should.

One small improvement in how they are designed and implemented could lead to a huge gain in their usefulness. Before the solution, let’s look at the problem..

The Problem

Assume you are paying for two external services who both expose an API. One is a todo system, and the other is email delivery. Your todo service provide a WebHook that fires on completion of a task.

When you complete a task in your todo, you want it to make a WebHook request to the email service’s API and get it message you with an email saying “Tremendous. You finished #{the_task}”.

The problem is that the request emitted by your todo service is nothing like the request that your email service is expecting.

The Solution

If you were able to supply a template for the todo service’s WebHook, and not just a URL callback, then you could configure it to produce a request which would feed the email API and produce your desired outcome. So, the solution is to make WebHook’s more sophisticated and dynamic i.e. they are created by supplying a template of the request, rather than just a URL.

So you could configure your todo WebHook by submitting something like this:

There are some other issues to address here such as auth (afaict, OAuth would work ok), but I think this presents a significant improvement in how users are able to compose together external services using WebHooks.

Any thoughts?

Since the discussion about links in JSON seems to be opening up, I thought I would briefly explain my take on linking with JSON and how that presents itself in the design of HAL.

We need a media type (the HTML of machines), not another XLink

We need a general purpose media type, which extends JSON, that can be used as a standard way to represent a resource and its relations to other resources on the Web.

It’s very important, particularly because we are talking about JSON, that this media type be simple in its design. There are only three essential capabilities for representing resources that this media type needs deliver:

  1. Representing resource state
  2. Linking to other resources
  3. 'Containment' of embedded resources

HAL (application/hal+json) is a media type designed for this purpose, and here’s how it provides the above capabilities:

Representing resource state

A resource in HAL is just a plain old JSON object with whatever properties you want, so it’s the same way of using JSON you’re doing right now:

  "name": "A product",
  "weight": 400,
  "dimensions": {
    "width": 100,
    "height": 10,
    "depth": 100
  "description": "A great product"

Linking to other resources

HAL provides its linking capability with a convention which says that a resource object has a reserved property called “_links”. This property, surprisingly enough, is an object that contains links. These links are key’ed by their link relation, e.g. self, next, product, customer, etc:

  "_links": {
    "self": { "href": "/product/987" },
    "upsell": [
      { "href": "/product/452", "title": "Flower pot" },
      { "href": "/product/832", "title": "Hover donkey" }
  "name": "A product",
  "weight": 400,
  .. *snip* ..

Where a relation may potentially have multiple links sharing the same key (e.g. the ‘upsell’ relation in the above example), the value should be an array of link objects:


Where a relation will only ever have one link (e.g. the ‘self’ relation), the value can just be a straight link object:


'Containment' of embedded resources

In some situations it’s more efficient to embed related resources rather than link to them, as it prevents clients from having to make extra round trips. HAL provides this capability with a convention which says a resource has another reserved property “_embedded”. This property is similar to _links in that embedded resources are key’ed by link relation - the difference is that, instead of link objects, the values are resource objects again which produces an elegant, recursive model. The idea being that a recursive model will help a lot when dealing with partials on the server side, and provide consistency on the client side.

The key aspect of these embedded resources is that they reset the context of any resource state and links which they contain. i.e. a link inside an embedded resource implicitly relates to that embedded resource and not the parent.

  "_links": {
    .. *snip* ..
  "_embedded": {
    "manufacturer": {
      "_links": {
        "self": { "href": "/manufacturers/328764" },
        "homepage": { "href": "" }
      "name": "Manufacturer Inc."
    "review": [
        "_links": {
          "self": { "href": "/review/126" },
          "customer": { "href": "/customer/fred", "title": "Fred Wilson" }
        "title": "Love it!",
        "content": "I love this product. I also bought a Hover Donkey with it.",
        "rating": 10
        "_links": {
          "self": { "href": "/review/178" },
          "customer": { "href": "/customer/tom", "title": "Tom Dee" }
        "title": "Hate it!",
        "content": "My hover donkey choked on it. DO NOT BUY!1!!",
        "rating": 0
  "name": "A product",
  "weight": 400,
  .. *snip* ..

The same rules apply with regard to the cardinality of embedded resources as to links e.g:[1].rating[0]._links.customer.title


Finally, here’s a visual representation of HAL’s information model:

Here’s where you can learn more about HAL:


Update: Happy to have found an existing solution to this problem

I’ve been promoting ESI as a serious consideration for anyone building scalable web apps for a while now, and reading mnot’s recent post about ESI prompted me to think about ways we can make ESI more efficient.

Something I think would be highly beneficial is a new HTTP header for marking responses as edge-processable. This would create additional visibility which intermediaries can use to avoid processing responses that require no attention.

As it stands; an ESI intermediary has to inspect the body of all passing responses, since it has no way of telling whether or not they include any esi elements that need processing*. Adding a standardised header would allow an ESI mechanism to skip all responses unless they are explicitly marked for esi processing, i.e. something like so:

> GET /some_page

< 200 OK
< Content-Type: text/html
< .....
< Edge-Process: ESI

Such a header, and a registry, would allow for other edge processing mechanisms to be introduced without each needing its own specific method of making itself visible.

* URI pattern rules are a strategy that will work to a point but has the downside of being brittle and a pita to manage.

I’ve just spent a decent chunk of my Sunday pulling together the new winter time table data for the latest update to the Jersey Bus iPhone app.. it was riveting stuff!

My good mate @robdudley has already asked for the data so he can use it in a web app he’s building, and I guess it could be useful for others in the future; so I’ve put it up as json in a gist on github so that you won’t have to go through the same tedium of compiling it yourselves.

The object structure should be relatively self explanatory, any questions about it leave a comment either on this post, the gist, or @mikekelly85

Window management and virtual desktops are, at least for me, a couple of the most important features an operating system provides in my day-to-day workflow. I do a lot of (re)arranging, resizing, and switching between windows. Being able to manage this via keyboard shortcuts is important because that’s where my hands are most of the time. I don’t want fancy gestures or GUI nonsense, I want something simple and efficient.

Unfortunately, OSX is lacking in this area out of the box - which I didn’t expect when I moved over from my linux setup 3 months ago. The good news is that I’ve found a couple of settings and (paid) apps that fill in the gaps and cover my basic requirements.

Virtual desktops on OSX are called ‘spaces’. Basically, you have a grid of spaces (I use 3x3 at the moment) in which you place groups of windows. Some of my spaces contain just one full-screen window e.g. I have one space exclusively for general web browsing, and others have multiple windows on them - my dev space has a text editor (MacVim), terminal(s), etc.

For me, the main benefit of arranging windows into separate desktops like this is that I can group them according to activities. This should (but doesn’t on OSX) mean that when I do something like cmd+tab, I’m only picking from the set of windows available on the current desktop. Instead, when you cmd+tab on OSX - you will get a list of every single application that’s running regardless of what space its on, and what’s more if you pick an application that’s in a different space (easily done) it will zoom you off to another space.

The zooming-you-off-to-another-space problem can be addressed by a spaces setting in your system preferences, by unticking the following checkbox:


Changing to show only the current space’s windows can’t be achieved by OSX alone. There’s an app that can help though called Witch. Witch has lots of settings but the two key ones for this purpose are:


Under ‘Triggers’, rebind ‘All applications’ to cmd+tab. It’ll give you a warning, just hit ok and ignore it.


Under ‘Behavior’ untick ‘List windows from all Spaces’.

There’s a couple of additional things that I got used to with my old linux setup that I didn’t get with OSX, keyboard shortcuts for:

  •  window resizing/positioning
  • sending windows between spaces

Both of these features are provided by an app called SizeUp.

SizeUp allows you to quickly position a window to fill exactly half the screen (splitscreen), a quarter of the screen (quadrant), full screen, or centered via the menu bar or configurable system-wide shortcuts (hotkeys). Similar to “tiled windows” functionality available on other operating systems.


The necessary options can be bound under the ‘Shortcuts’ tab when you open the SizeUp app.

So here’s a summary of my shortcuts (and where they’re set):

  • ctrl+arrow : move space (OSX)
  • ctrl+cmd+arrow : move space and bring active window (SizeUp)
  • cmd+tab : switch through windows available in the current space (Witch)
  • ctrl+alt+cmd+arrow : size window to half of the screen (SizeUp)
  • ctrl+alt+shift : size window to quarter of the screen (SizeUp)
  • ctrl+alt+cmd+m : size window to full screen (SizeUp)

I’m interested to know if anyone else use a smiliar setup, and if there are better approaches.. 


I believe the current proposal for offline web applications is too complicated, fiddly, and brittle. There is a cleaner and more efficient approach which makes better use of existing mechanisms of the web to negotiate and manage “offline assets”. Here’s a brief summary:

The essence of this proposal is that a proper solution to the offline web app problem should not require drawing a distinction between “offline” and “online” assets. There is no need for ‘cache manifests’, or to create a separate ‘application cache’ from the standard browser cache.

This solution should leverage existing web caching infrastructure (i.e. HTTP headers such as Cache-Control, Etags, etc) to control how browsers store and negotiate the assets required to run the application offline.

Key to this solution would be browser cache compliance with the HTTP Cache-Control Extensions for Stale Content. In a nutshell, essential assets (html, javascript, css, etc) of the application required for offline usage would be served with a ‘stale-if-error’ directive in the Cache-Control header. When the browser is taken offline and the origin server cannot be reached, all of these assets will be served out of the cache (due to the stale-if-error directive) and the application will continue to function.

That’s pretty much it.

There is one significant hurdle here, and that is the limited capacity and reliability of local browser caches. However, a relatively simple solution would be to create a new HTTP header with which a server can indicate the cache storage requirements of its application (by domain). Something like

> GET / HTTP/1.1
> Host:
> ...
< 200 OK
< Cache-Storage: 160M
< ..... 

If this is the first time the browser has encountered this app then the user is prompted to grant access for the domain to reserve the disk space to ‘install the offline app’. The user can then either accept, or reject (and optionally remember their choice).

Alternatively, the cache-storage value could be expressed as metadata in the <head> of the application’s main html document, however this would result in the pattern being unusable outside of html apps.

Either way, this would offer a familiar experience for users; where they are asked to ‘install’ an offline web app in much the same way they would be by traditional desktop software.

When the reserved storage is full, any assets that were served with a stale-if-error directive must take priority in the reserved storage over those that weren’t.

Application developers can manage the way updates to individual assets are negotiated using HTTP’s standard caching mechanisms such as Cache-control max-age and ETag validations.

So that is the general gist of the proposal.. if anyone is interested let me know - perhaps we could try and flesh it out a bit more and push it forward.