This has been known about for years, and was a concern on various mailing lists years ago. The solution at the time was said to be that browser vendors will build in tools for cache control in the same way they have for cookie controls.<p>The first sites to exploit this were, as always, porn sites. They used Etags in referral tracking to avoid webmaster fraud. (the webmaster would have to include a script from the affiliate co which would set an Etag).<p>You know what is more interesting? The Last-Modified header. The HTTP spec says that you are supposed to put a date in there, but it also says not to bother parsing the date if you are a client since date parsing is such a pain in the ass. So clients just copy the date string and store it and then replay it subsequent requests.<p>you can put whatever the hell you want in a last-modified field and <i>all</i> browsers will just store it and then replay it later in subsequent requests to the same resource. for eg.<p>initial request:<p><pre><code> GET /_modified_test HTTP/1.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Cache-Control: max-age=0
Connection: keep-alive
Host: localhost:8888
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_6) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.830.0 Safari/535.1
</code></pre>
initial server response from my dev server (note Last-Modified header used):<p><pre><code> HTTP/1.0 200 OK
Server: Dev/1.0
Date: Sat, 30 Jul 2011 11:48:25 GMT
content-type: text/html; charset=utf8
Last-Modified: random_token_i_set
Cache-Control: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Content-Length: 1634
</code></pre>
subsequent browser request to the same resource:<p><pre><code> GET /_modified_test HTTP/1.1
Host: localhost:8888
Connection: keep-alive
Cache-Control: max-age=0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_6) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.830.0 Safari/535.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
If-Modified-Since: random_token_i_set
</code></pre>
with new webapps now being single-page with either hashchange or pushstate support, it means almost all requests are made on the backend to the same resource, so you can track the user across all pages on the entire site and across other sites.<p>concerning, but a known problem. even with these headers patched there is still a lot of information that can be used to fingerprint clients (ie. having everything switched off is still a fingerprint that makes you unique). I don't think chrome, safari, IE or Firefox will ever implement these advanced features, it will be up to somebody else to release a browser that is more privacy aware or to maintain a plugin that is.<p>I wrote a plugin that does this, but a lot of information still leaks through (it is in my github but I haven't released/announced it in any way). I am contemplating just forking webkit and doing a whole separate 'privacy aware' browser but haven't found the time. in short, the browser makers know about this, and have known about it for years - there is just no real interest in providing user tools to fully anonymize users.<p>Edit: if anybody is interested in the plugin it is here: <a href="https://github.com/nikcub/Parley" rel="nofollow">https://github.com/nikcub/Parley</a><p>it blocks all third party requests and provides other features. it works, just needs a bit of a clean up and release.