Reduce Bandwidth Usage by Supporting If-Modified-Since in PHP
Sunday, 30 September 2007
By default, pages generated with PHP are not cached by browsers or proxies, as they are generated anew every time the page is loaded by the server. If you have repeat visitors to your website, or even many visitors that use the same proxy, this means that a lot of bandwidth is wasted transferring content that hasn't changed since last time. By adding appropriate code to your PHP pages, you can allow your pages to be cached, and reduce the required bandwidth.
As Bruce Eckel points out in RSS: The Wrong Solution to a Broken Internet, this is a particular problem for RSS feeds — feed readers are often overly enthusiastic in their checking rate, and given the tendency of bloggers to provide full feeds this can lead to a lot of wasted bandwidth. By using the code from this article in your feed-generating code you can save yourself a whole lot of bandwidth.
Caching and HTTP headers
Whenever a page is requested by a browser, the server response includes a Last-Modified
header in the response which
indicates the last modification time. For static pages, this is the last modification time of the file, but for dynamic pages it
typically defaults to the time the page was requested. Whenever a page is requested that has been seen before, browsers or proxies
generally take the Last-Modified
time from the cached version and populate an If-Modified-Since
request
header with it. If the page has not changed since then, the server should respond with a 304 response code to
indicate that the cached version is still valid, rather than sending the page content again.
To handle this correctly for PHP pages requires two things:
- Identifying the last modification time for the page, and
- Checking the request headers for the
If-Modified-Since
.
Timestamps
There are two components to the last modification time: the date of the data used to generate the page, and the date of the
script itself. Both are equally important, as we want the page to be updated when the data changes, and if the script has been
changed the generated page may be different (for example, the layout could be different). My PHP code incorporates both by
defaulting the modification time of the script, and allowing the user to pass in the data modification time, which is used if it is
more recent than the script. The last modification time is then used to generate a Last-Modified
header, and returned
to the caller. Here is the function that adds the Last-Modified
header. It uses both getlastmod()
and
filemtime(__FILE__)
to determine the script modification time, on the assumption that this function is in a file
included from the main script, and we want to detect changes to either.
function setLastModified($last_modified=NULL) { $page_modified=getlastmod(); if(empty($last_modified) || ($last_modified < $page_modified)) { $last_modified=$page_modified; } $header_modified=filemtime(__FILE__); if($header_modified > $last_modified) { $last_modified=$header_modified; } header('Last-Modified: ' . date("r",$last_modified)); return $last_modified; }
Handling If-Modified-Since
If the If-Modified-Since
request header is present, then it can be parsed to get a timestamp that can be compared
against the modification time. If the modification time is older than the request time, a 304 response can be
returned instead of generating the page.
In PHP, the HTTP request headers are generally stored in the $_SERVER
superglobal with a name starting with
HTTP_
based on the header name. For our purposes, we need the HTTP_IF_MODIFIED_SINCE
entry, which
corresponds to the If-Modified-Since
header. We can check for this with array_key_exists
, and parse the
date with strtotime
. There's a slight complication in that old browsers used to add additional data to this header,
separated with a semicolon, so we need to strip that out (using preg_replace
) before parsing. If the header is present,
and the specified date is more recent than the last-modified time, we can just return the 304 response code and
quit — no further output required. Here is the function that handles this:
function exitIfNotModifiedSince($last_modified) { if(array_key_exists("HTTP_IF_MODIFIED_SINCE",$_SERVER)) { $if_modified_since=strtotime(preg_replace('/;.*$/','',$_SERVER["HTTP_IF_MODIFIED_SINCE"])); if($if_modified_since >= $last_modified) { header("HTTP/1.0 304 Not Modified"); exit(); } } }
Putting it all together
Using the two functions together is really simple:
exitIfNotModifiedSince(setLastModified()); // for pages with no data-dependency exitIfNotModifiedSince(setLastModified($data_modification_time)); // for data-dependent pages
Of course, you can use the functions separately if that better suits your needs.
Posted by Anthony Williams
[/ webdesign /] permanent link
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Design and Content Copyright © 2005-2024 Just Software Solutions Ltd. All rights reserved. | Privacy Policy
16 Comments
Great bit of code, thanks
Would you put this code just after the session handler and before the page's document declaration?
The tip is great. I was checking something for a project of mine, and finally got it. I wanted a method to dynamically serve some js files in minified format, using the jsmin-php project from google code. I wanted to enhance the system using if-modified-since headers. Will post the direct url from my site once my project is over.
Hi completed my project.. and is available at http://www.php-trivandrum.org/code-snippets/reduce-bandwidth-usage-in-php.html, though the caption is the same, I went in a slightly different point. My requirements were to compress the javascript files.. Will need to make this a wordpress plugin, with the future expiry as a plugin option.
great thanks.. i have tried with meta tag "revise affter" for preventing google bot crawl too much my classified site. but no success
then find out your solution... this help me alot .. combine with some cached file solution .. this function is the best to reduce load of server..
thank again from tintuc
sorry i got other question related to caching file
normaly i put your function before other function which include the cache file to the page.. but if the cached file may be change over time..
which is the best for my.. put this function just before or affter include cached files
regards
great, thankssssssss
I read this to better understand and apache mod Modified.
great tips, I sent him a link to this article as well as used your email link to send Expedia a scolding on the topic
thanks . just apply to my game site.. it work great
very useful info bro ... I am really thankful to you ... going to read it again and implement it
thanks again
Thanks for a marvelous posting! I quite enjoyed reading it, you may be a great author.I will ensure that I bookmark your blog and may come back down thee road. I want to encourage yourself to continue your great job, have a nice day!
Thanks for a marvelous posting! I quite enjoyed reading it, you may be a great author.I will ensure that I bookmark your blog and may come back down thee road. I want to encourage yourself to continue your great job, have a nice day!
Your article is very good, I will regularly visit this site to read.
Thank you for sharing, I would often to this site to read the information.
what does this mean when you write ($last_modified=NULL) in bracket?
I presume you're referring to
function setLastModified($last_modified=NULL)
This says that the function setLastModified takes a single parameter called $last_modified, which is set to NULL if the caller doesn't provide a value. You can thus say setLastModified() to set the date to the page modification date, or setLastModified($some_date) to set the date to the contents of $some_date.