My site distributes daily newsletters via PDF that are typically about 2MB in size. A link is provided to the PDF, and when clicked, it opens the Adobe Reader plugin within the browser (Firefox, IE) and opens the file therein.
Recently it was brought to my attention by a dialup user (they do still exist) that when he navigates back or forward in his browser's history, the PDF has to be redownloaded rather than reloaded from his browser's cache. I, with a broadband connection and some wonderful Firefox extensions (Firefox Throttle and Live HTTP Headers) confirmed this behavior.
However, a similarly sized file (2MB) of a different type (text) is cached by the browser and I can go back and forth in the history without the file having to be redownloaded. I also had success with a smaller sized PDF (70kb).
What appears to happen is this:
1) Click the link to the 2MB PDF, and it loads in the browser.
2) Click browser's back button to back to previous page in browser history.
3) Click forward button to return to PDF.
4) A request with a "Range: <byte range>" header is sent which I think requests part of the PDF file (the byte ranges).
5) The server responds with a "206 Partial Content" response code and resends some/most/all of the PDF to the browser.
What should happen (and does with other file types and smaller sized PDFs) is that the browser should make a simple request for the PDF file, and assuming the PDF hasn't been modified since the latest request for it, the server should send a "304 Not Modified" response code which instructs the browser to use what it has in cache.
I've increased my browser cache sizes dramatically to no avail. Cleared them before starting the process to no avail. Should be no problem with any browser's cache limit.
I'm using Firefox 2 and IE7. My co-worker has IE6 and Acrobat Reader 6 installed on another machine and it works fine there (PDF is cached and does not need to be redownloaded).
Help! Why aren't these larger PDF files being cached correctly, and what can we do to get them saved in the cache so our poor dialup users don't have to redownload them everytime they navigate their browser history?
Just wanted to reply with my discovery and fix. I'm not sure if it is Adobe Reader or the web browsers (I tested Firefox 2 and IE7), but they appear to request parts of large PDF files "on demand." That is, they send requests to the web server with a "Range: <byte range(s)>" header which instructs the web server to only send segments of the file. I think they use the feature of HTTP 1.1 that keeps the connection open or alive to request/send the data incrementally.<br /><br />While this might seem preferrable, in the case of dialup users that navigate through their browser(s) history, they should instead download the PDF in its entirety and allow the browser to cache the file. This way, when they click the link to open the PDF again (or go back/forward in their browser's history), they use what they have in their cache instead of redownloading the PDF (which in the case of dialup users is quite painful).<br /><br />We fixed the problem by instructing our web server to ignore/disregard byte range requests. In a nutshell, you need to have the web server issue a header:<br /><br />Accept-Ranges: none<br /><br />In Apache, you can do this on a per-file-type basis using a FilesMatch directive. In IIS (we're using 5.0), you cannot specify a file type for this behavior; you have to set it in Internet Services Manager by adding a custom header under the "HTTP Headers" tab.<br /><br />Personally, I prefer folks use their browser cache than repeatedly request data (PDF file in this case) from our server. To each his own I guess. Hope this helps someone.