Using Etags and Last-modified headers to improve performance with HTTP

In the recent update of the HTTP specification, the details of conditional requests have been split out and given their whole own specification. Most developers I talk to are familiar with the idea of 304 Not Modified response code, but whenever we start to dig deeper everyone, myself included, are missing pieces of the puzzle. This article is one of a series of blog posts that attempts to dig in to aspects of HTTP and provide practical guidance on their usage.

Conditional HTTP requests are used to perform an action, unless the target representation has changed, in which case they do something else. There are two common ways that this mechanism is used:

we can prevent a client from having to download bytes that it already has.
we can prevent a client from making changes to a resource that has been changed by someone else since they retrieved it

Validators tell you when you should care about change

In order to make a conditional HTTP request, a special type of HTTP header called a validator needs to be sent in the response from the server. The two validators defined by the HTTP specification are Etag and Last-modified. Validators are values that allow you to compare if two resource representations are the same. The definition of “same” depends on the flavour of validator and the opinion of the origin server.

What flavour is your validator?

Validator values are considered either strong or weak. Strong validators tell you if the response body is identical. Weak validators are used to allow an origin server to group multiple slightly different representations together as equivalent.

A strong validator can be used to identify as equivalent two different representations where the headers differ. For example, supposing you requested the same resource as text/plain and then as text/html. If the server supported returning both of those media types, then both representations could have exactly the same body bytes and therefore returning a 304 Not modified would be feasible for the second request. However, the origin server does have the option of changing the validator value if it believes the changed header is semantically significant.

Weak validators are used to deliver new content to a client when the representation has significantly changed, based on what the origin server deems significant. This capability is interesting because it can enable a server to manage load on a server, by controlling how often it signals that a resource representation has changed. This might be useful if clients are aggressively polling for changes to a representation. The server can effectively lie to clients for a short period of time and return 304 in order to reduce the amount of bytes being delivered by the server.

Weak validators can also be used to identify equivalence between two representations that are functionally equivalent. For example, a cache might contain a compressed copy of a representation and when it receives a conditional request for an uncompressed copy, it can identify that the client already has an up to date version of the representation even though the body bytes do not match. This saves a cache from having to hold a cached copy of both the compressed and uncompressed version.

Last Modified validator

The Last-modified header is simply a date and time using the HTTP date format. The value is considered to be the time when the representation was last changed. The Last-modified validator is normally considered a weak validator because the HTTP date format only has the resolution of one second and it is often possible for multiple different representations to be retrieved within one second. However, there are scenarios where a Last-modified header can be considered strong.

Making an Etag

There are no defined rules of how to manufacture an Etag value. The guideline would be just to do whatever is the easiest thing that can identify a “version” of a resource representation. Sometimes people will create a hash of the response body. A timestamp value is another way of generating an Etag value. If your internal entity maintains some kind of revision number, then that can be a very useful component of an Etag. Sometimes internal entities are naturally immutable and have an internal identifier, in which case those identifiers can be used in an Etag. It is important to remember that if there are multiple representations of the resource then the internal identifier needs to be combined with some kind of representation identifier to ensure the Etag is unique for each representation.

Making an Etag Header

An Etag header requires putting quotes around the Etag value. There are also constraints on what characters can be an Etag value. If you feel a burning desire to put unusual characters in your Etag value, you might want to check the spec first. The recent update of the HTTP specification now removes the possibility of using escaped values in Etag values.

Below is a request from the Uber API with an Etag header.

If you want to identify a value as a Weak Etag then you simply put W/ in front of the quoted Etag value.

Putting your validators into use

When I started this post, I had intended to make this a single post. The more I read RFC 7232, the more I realized there is way more that needs to be said about handling conditional requests and making conditional requests, so I’m going to leave those discussions for follow-up posts. In the meanwhile, if you are hungry for more HTTP insight, I’ve made an index of the HTTP related posts that I have done to-date.

Image credit: Wax seal https://flic.kr/p/7piVnt
Image credit: Flavors https://flic.kr/p/nm6sps

Using Etags and Last-modified headers to improve performance with HTTP conditional requests