Query String Squid

Caching resources with query strings

This afternoon Scott Hanselman posted a fairly innocuous question on twitter.  However, the question involved versioning of a RESTful API, which is a subject that is sure to bring out lots of opinions.  This post is less about the versioning question and more about the commonly held belief that caches do things differently with URLs that have query strings.

Have you ever seen a RESTful API require their version *in the querystring*?— Scott Hanselman (@shanselman) September 2, 2014

Query strings deserve more love

Putting a version number into a query string parameter is actually quite an interesting idea.  For one, it can be made optional, so that a consumer can choose to always consume the latest version, or request a specific version.  Secondly, I like it because I can selectively version just the resources that I want to.  The part I dislike about putting a version as the first segment of the URL is that it blindly versions every resource in the API whether there has been breaking changes or not.   Putting a version at the end of the URL is much better, but putting it in the path can be tricky for both client and server implementations.  Particularly if you are not using hypermedia.

Caching is king

The last reason I like the version number in the query string is that it makes the specific version a separately cacheable resource.

When I mentioned my opinion to Scott on twitter, @jeremymiller responded with,

@darrel_miller @shanselman Isn’t the best practice *not* to use query string parameters because some proxies ignore that?— jeremydmiller (@jeremydmiller) September 2, 2014

This is not the first time that I’ve heard this guidance.  In the past when I investigated this, I discovered that a number of the HTTP caches provide configuration options that allow you avoid caching resource with query string parameters from particular domains.  I recall reading about reasons like corporate proxy caches getting filled up with tiles from google maps, pushing out all the content that get high hit rates with stuff that have very low hit rates.

As far as I understood, it was a configuration option that someone explicitly had to set to filter out certain sites. This didn’t seem like sufficient justification for the pervasiveness of the belief that query strings break caching.  So I decided to go hunting again.

A search for the source

My search kept bringing me back to a blog post by Steve Souders on how to update files that have been released with extremely long expiry times.  In this article Steve demonstrates when using a version number in the query string, the Squid proxy cache does not cache the static file at all.  He goes on to say that Squid can be configured to cache these files, but it is not the default behaviour.  Based on this information, he rightly suggested that people don’t use versions in their query string if they want to cache static resources.

Query String Squid

That was in August 2008.  Earlier that year, in May 2008, Squid released version 2.7 which changed their default behaviour to no longer refuse to cache URLs that contained a query string.  The Squid documentation explains the changes, and claims that one of the reasons for the change was because sites like YouTube were intentionally putting stuff in the query string to prevent caches from reducing traffic to their site.  I guess someone had investors they needed to impress 

We are six years down the road and the guidance based on a default setting in a HTTP cache is still with us, discouraging developers from using a significant architectural component of a resource identifier.

I don’t doubt that there are numerous other web frameworks and output caching mechanisms that apply similar constraints to resources with query strings.  However, these components should be under the developer’s control.  There is no need to artificially constraint ourselves to describing cacheable resources using only the path component of a URL.

Since RFC 3986 was released in 2005 the role of the query string has been well defined:

The query component contains non-hierarchical data that, along with data in the path component, serves to identify a resource within the scope of the URI’s scheme and naming authority

Let us move forward and start treating query strings with the respect that they deserve.

Image Credit: Squid https://flic.kr/p/79fueL

Related Blog