Should your site be using etags or not?

The subject of eTags and there usage has caused a lot of debate over whether they should be employed or not, the question of using them or not really depends on your server setup.A lot of the etag debate is down to Yahoo’s YSlow tool that always reports that they should be turned off in order to improve performance, but is this always the right thing to do? I used to think so but a bit of research has revealed that this may not be the best course of action.

What are etags and what do etags do?

First off let’s try to understand what etags actually are. If you have etags turned on for your site then each asset sent from your server to a client is sent with an etag in it’s header. For apache this key is constructed from the files inode, size and last modified datetime.

So what does the client actually do with an etag for a particular asset? If the client has the asset cached then the expire header of the asset is checked first to see if the server needs to be contacted at all. If the asset has not expired then the etag has no effect at all, the locally cached version will be used. If the asset has indeed expired then the client will send a request for the asset with the etag stored by the client to the server. The server performs a comparison between the etag of the asset and the etag sent by the client, if the etags match then the server will return a 304 not modified header which instructs the browser to use it’s cached version of the asset. If they do not match the server will return the asset.

As you might have deduced, if any of the properties that make up an etag for the assets web server change then a different etag will be generated which will force the users browser to recache the file, this will only happen however if the clients cached version of the asset has expired though as noted above. So etags will not magically force clients to recache assets when you make changes to them.

When etags go bad

So far etags certainly sound like a useful tool for assets caching. Things aren’t so simple however due to an issue that arises when etags are used in load balanced environments.

As noted above for apache (and for other web servers as well) an assets location on disk is taken into account when generating an etag, this includes information about the actual server that the file is located on. This means that in a load balanced environment each server will generate a different etag for the same file, making etags pretty useless in such circumstances. This is the issue that causes a lot of people to disable etags completely and opt for using cache control headers only for their assets.

There is a suggested solution for this, and that involves removing the inode part from etags entirely so they will be based on a files size and last modified time .

In apache you can do this using this line:

FileETag MTime Size

Please consult your webservers documentation if you are not using apache, but I’d imagine than the syntax will be very similar. This line can be placed as is in your vhost to affect all etags, you can also place it in a directory definition or a filesmatch definition.

Whether or not this will improve things is dependent on your infrastructure however due to the fact that a files last modified time might be different between load balanced servers depending on how you deploy your projects. If they are deployed in such a way that the last modified time is the same across each server then you can go ahead and use this fix, if not then turn etags off, they will not help out at all.

So should you use etags or not?

Well as you can probably ascertain there is no single definitive answer to this, it really does depend on your sites server setup and possibly on your deployment process. To use etags I think that you must satisfy these conditions:

  • Be on a single server
  • Or be on multiple servers, but deploy your project in such a way that every files last modified time is the same

If these conditions cannot be satisfied then do not use etags. If you don’t satisfy these conditions then you can try to achieve them of course, but in all likelihood the actual end result of having etags enabled may not be worth all of the effort. Cache control headers can always be used and will always come into play before etags are even touched upon at all, so relying on cache control headers for your assets is definitely a sound plan. Here’s some further reading on setting up cache control headers for you assets.

References and further reading

Wikipedia on eTags

Setting up cache control headers for you assets