Blog Archive

Testing Your Website in Multiple Browsers

Monday, 03 December 2007

When designing websites, it is very important to check the results in multiple web browsers — something that looks fine in Internet Explorer may look disastrous in Firefox, and vice-versa. This problem is due to the different way in which each web browser interprets the HTML, XHTML and CSS standards, combined with any bugs that may be present. If you're designing a website, you have no control over which browser people will use to view it, so you need to ensure that your website displays acceptably in as many different browsers as possible.

The only way to know for sure how a website looks in a particular browser is to try it out. If you don't check it, how do you know you won't hit a bug or other display quirk? However, given the plethora of web browsers and operating systems out there, testing in all of them is just not practical, so you need to choose a subset. The question is: which subset?

Popular browsers

Thankfully, most people use one of a few "popular" browsers, but that's still quite a few. In my experience, on Windows, the most popular browsers are Firefox, Internet Explorer and Opera, on Linux most people use Firefox, Mozilla or Netscape and on MacOS most people use Safari or Camino. Obviously, the relative proportions of users using each browser will vary depending on your website, and target niche — a website focused on non-technical users is far more likely to find users with Internet Explorer on Windows than anything else, whereas a website focused on linux kernel development will probably find the popular browser is Firefox on linux.

Which version?

It's all very well having identified a few popular browsers to use for testing, but an equally crucial aspect is which version of the browser to test. Users of Firefox, Opera, Mozilla, and recent versions of Netscape might be expected to upgrade frequently, whereas users of Internet Explorer might be far less likely to upgrade, especially if they are non technical (in which case they'll stick with the version that came with their PC). Checking the logs of some the websites I maintain shows that the vast majority of Firefox users (90+%) are using some variant of Firefox 2.0 (though there are a smattering all the way back to Firefox 0.5), whereas Internet Explorer users are divided between IE7 and IE6, with the ratio varying with the site.

Don't forget a text-only browser

A text only browser such as Lynx is ideal for seeing how your site will look to a search engine spider. Not only that, but certain screen reader applications will also give the same view to their users. Consequently, it's always worth checking with a text-only browser to ensure that your site is still usable without all the pretty visuals.

Multiple Browsers on the same machine

Having chosen your browsers and versions, the simplest way to test your sites is to install all the browsers on the same machine. That way, you can just open the windows side by side, and compare the results. Of course, you can't do this if the browsers run on different platforms, but one option there is to use virtual machines to test on multiple platforms with a single physical machine. Testing multiple versions of Internet Explorer can also be difficult, but TredoSoft have a nice little package called Multiple IEs which enables you to install multiple versions of Internet Explorer on the same PC. Thanks to Multiple IEs, on my Windows XP machine I've got IE3, IE4.01, IE5.01, IE5.5, IE6 and IE7, as well as Firefox, Opera, Safari and Lynx!

Snapshot services

If you don't fancy installing lots of browsers yourself, or you don't have access to the desired target platform, you can always use one of the online snapshot services such as browsershots (free) or browsercam (paid). These provide you with the ability to take a snapshot of your website, as seen in a long list of browsers on a long list of platforms. Browsercam also provides remote access to the testing machines, so you can interact with your sites and check dynamic aspects, such as Javascript — something that's becoming increasingly more important as AJAX becomes more prevalent.

Posted by Anthony Williams
[/ testing /] permanent link
Tags: testing, browsers, website, webdesign
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Testing on Multiple Platforms with VMWare

Tuesday, 27 November 2007

Whilst testing on multiple platforms is important, it can be difficult to obtain access to machines running all the platforms that you wish to test on. This is where virtualization software such as VMWare comes in handy: you don't need to have a separate machine for each tested platform — you don't even need a separate partition. Instead, you set up a Virtual Machine running the target platform, which runs on top of your existing OS. This Virtual Machine is completely self-contained, running off a virtual hard disk contained in a file on your real disk, and with a virtual screen which can be shown in a window on your host desktop.

Virtual Networks

This can be incredibly useful: not only can you test on multiple platforms without repartitioning your hard disk, but you can have multiple virtual machines running simultaneously. If you're developing an application that needs to run on multiple platforms, this can be invaluable, as you can see what the application looks like on different operating systems simultaneously. It also allows you to test network communication — each virtual machine is entirely independent of the others, so you can run a server application on one and a client application on another without having to build a physical network.

Get Started with a Pre-built Virtual Machine

VMWare have a repository of pre-built virtual machines, that they call "appliances". This makes it very easy to get started, without all the hassle of installing the OS. Some appliances even come with pre-installed applications — if you want to try a Ruby on Rails app on linux, then the UbuntuWebServer appliance might be a good place to start.

Warning: Virtual Machines use Real Resources

It's worth noting that the resource use (CPU, memory, disk space) is real, even if the machines are virtual — if you run a CPU-intensive application on your virtual machine, your system will slow down; if you give 3 virtual machines 1Gb of memory each but you only have 2Gb installed, you're going to see a lot of swapping. Virtual machines are not full-time replacements for physical ones unless you have a server with a lot of resources. That said, if you do have a server with a lot of resources, running separate systems and applications in separate virtual machines can make a lot of sense: the individual systems are completely isolated from one-another, so if one application crashes or destroys its (virtual) disk, the others are unaffected. Some web hosting companies use this facility to provide each customer with root access to their own virtual machine, for example.

It's also worth noting that if you install a non-free operating system such as Microsoft Windows, you still need a valid license.

Alternatives

VMWare Server is currently a free download for Windows and Linux, but it's not the only product out there. VirtualBox is also free, and runs on Windows, Linux and MacOSX. One nice feature that VirtualBox has is "seamless Windows": when running Microsoft Windows as the guest operating system, you can suppress the desktop background, so that the application windows from the virtual machine appear on the host desktop.

Another alternative is QEMU which offers full-blown emulation as well as virtualization. This allows you to experiment with operating systems running on a different CPU, though the emulated hardware can be quite limited.

Posted by Anthony Williams
[/ testing /] permanent link
Tags: testing, vmware, virtualization, virtual machine
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Database Tip: Use an Artificial Primary Key

Monday, 19 November 2007

If your data has a clear "master" field or combination of fields, which can uniquely identify each row (such as customer name in a table of customers or ISBN for a table of books), it is tempting to use that as the primary key of the table. However, my advice is: don't do that, use a separate, numeric, artificial primary key instead. Yes, it's an extra column on the table, requiring extra space, and you will have to generate it somehow, but that's not a big deal. Every database vendor provides some way of auto-generating unique key values (e.g. SEQUENCEs in Oracle, and AUTOINCREMENT fields in SQLite), so populating it is easy, and the complications it saves are more than worth the trade-off. You can still maintain the uniqueness of the master columns by applying a unique index to those columns.

Save Space on Foreign Keys

Firstly, if you have any tables that are associated with the master table, and therefore have foreign key columns that refer to the primary key of your master, then having a separate primary key can actually save space overall, as the data for the master columns doesn't have to be duplicated across all the linked tables. This is especially important if there is more than one "master column", such as customer_first_name and customer_last_name, or if the data for these columns is large.

Changing the master data

Secondly, if the "master columns" are actually the primary key of your table, changing the data in them is potentially problematic, especially if they are used as a foreign key in other tables. Many online services use a customer's email address as their "master column": each customer has one email address, and one email address refers to one customer. That's fine until a customer changes their email address. Obviously, you don't want to lose all data associated with a given customer just because they changed their email address, so you need to update the row rather than delete the old one and insert a new one. If the email_address column is the primary key of the table, and therefore used as the foreign key in other tables, then you've got to update the data not just in the master table, but in each dependent table too.

This is not impossible, but it's certainly more complex and time consuming. If you miss a table, the transaction may not complete due to foreign key constraint violations, or (worse) the transaction may complete, but some of the data may be orphaned. Also, in some database engines, the constraint violation will fire when you change either the master table or the dependent table, so you need to execute a special SQL statement to defer the constraint checking until COMMIT time. If you use an auto-generated primary key, then only the data in the master table needs changing.

Changing the master columns

Finally, if the primary key is auto-generated, then not only is it easy to change the data in the master columns, but you can actually change what the master columns are. For example, if you initially decide that customer_first_name and customer_last_name make an ideal primary key, then you're stuck if you then get another customer with the same name. OK, so you make customer_first_name, customer_last_name and customer_address the primary key. Oops — now you've got to duplicate the address information across all the dependent tables. Now you encounter two people with the same name at the same address (e.g. father and son), so you need to add a new designator to the key (e.g. Henry Jones Junior, Richard Wilkins III). Again, you need to update all the dependent tables. If the primary key is auto-generated, there's no problem — just update the unique constraint on the master table to include the appropriate columns, and all is well, with the minimum of fuss.

Simplify your code

It's not going to simplify your code much, but using an auto-generated numeric key means that this is all you need to store as an identifier inside your program to refer to a particular row: much easier than storing the data from a combination of columns. Also, it's much easier to write code to update the data on one table than on multiple tables.

Conclusion

Don't use real table data as the primary key for a table: instead, use a separate, numeric, auto-generated column as the primary key. This will simplify the connections between tables, and make your life easier if the structure of the database or the data in the key columns changes.

In previous posts on Database Design, I've talked about:

Posted by Anthony Williams
[/ database /] permanent link
Tags: database, primary key, foreign key
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Elegance in Software

Monday, 12 November 2007

What does it mean for software to be elegant? When I write code, elegance is something I aspire to, and in some senses goes hand-in-hand with beautiful code, but that doesn't really make it any clearer. Certainly, I think there is a strong element of "elegance is in the eye of the beholder", but I think there are also some characteristics of the software that are contributory factors — how a particular person may rate the code on any aspect may vary, as may the importance they place on any given aspect, but these aspects will almost certainly impact how elegant the code appears.

Factors affecting the elegance of software

Here's a short list of some of the factors that I think are important. Obviously, this is not an exhaustive list, and all comments are my opinion, and not definitive.

Does it work?: I'd be hard-pushed to call software "elegant" if it didn't work
Is it easy to understand?: Lots of the following factors can really be summarised by this one: if I can't understand the code, it's not elegant.
Is it efficient?: A bubble sort is just not elegant, because there's lots of much more efficient algorithms. If a cunning algorithmic trick can drastically reduce the runtime, using that trick contributes to making the code elegant, especially if it is still easy to understand.
Short functions: Long functions make the code hard to follow. If I can't see the whole function on one screen in my editor, it's too long. Ideally, a function should be really short, less than 5 lines.
Good naming: Short functions are all very well, but if functions are called foo, abc, or wrt_lng_dt, it can still be hard to understand the code. Of course, this applies to classes just as much as functions.
Clear division of responsibility: It is important that it is clear which function or class is responsible for any given aspect of the design. Not only that, but a class or function should not have too many responsibilities — by the Single Responsibility Principle a class or function should have just one responsibility.
High cohesion: Cohesion is a measure of how closely related the data items and functions in a class or module are to each other. This is tightly tied in to division of responsibility — if a function is responsible for calculating primes and managing network connections, then it has low cohesion, and a poor division of responsibility.
Low coupling: Classes and modules should not have have unnecessary dependencies between them. If a change to the internals of one class or function requires a change to apparently unrelated code elsewhere, there is too much coupling. This is also related to the division of responsibility, and excessive coupling can be a sign that too many classes, modules or functions share a single responsibility.
Appropriate use of OO and other techniques: It is not always appropriate to encapsulate something in a class — sometimes a simple function will suffice, and sometimes other techniques are more appropriate. This is also related to the division of responsibilities, but it goes beyond that — is this code structure the most appropriate for handling this particular responsibility? Language idioms come into play here: is it more appropriate to use STL-style std::sort on an iterator interface, or does it make more sense to provide a sort member function? Can the algorithm be expressed in a functional way, or is an imperative style more appropriate?
Minimal code: Code should be short and to-the-point. Overly-long code can be the consequence of doing things at too low a level, and doing byte-shuffling rather than using a high-level sort algorithm. It can also be the consequence of too many levels of indirection — if a function does nothing except call one other function, it's getting in the way. Sometimes this can be at odds with good naming — a well-named function with a clear responsibility just happens to be able to delegate to a generic function, for example — but there's obviously a trade-off. Minimal code is also related to duplication — if two blocks of code do the same thing, one of them should be eliminated.

One thing that is not present in the above list is comments in the code. In my view, the presence of comments in the code implies that the code is not sufficiently clear. Yes, well-written comments can make it easier to understand a given block of code, but they should in general be unnecessary: truly elegant code can be understood without comments. Of course, you might need to understand what it is that the code is trying to accomplish before it makes complete sense, particularly if the code is using advanced algorithms, and comments can help with that (e.g. by providing a reference to the algorithm), but my general view is that comments are a sign of less-than-perfect code.

Let me know what you think constitutes elegant code.

Posted by Anthony Williams
[/ design /] permanent link
Tags: design, elegance, software
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Review of Patterns for Parallel Programming by Timothy G. Mattson, Beverly A. Sanders and Berna L. Massingill

Thursday, 01 November 2007

Cover Image for Patterns for Parallel Programming

This book gives a broad overview of techniques for writing parallel programs. It is not an API reference, though it does have examples that use OpenMP, MPI and Java, and contains a brief overview of each in appendices. Instead, it covers the issues you have to think about whilst writing parallel programs, starting with identifying the exploitable concurrency in the application, and moving through techniques for structuring algorithms and data, and various synchronization techniques.

The authors do a thorough job of explaining the jargon surrounding parallel programming, such as what a NUMA machine is, what SPMD means, and what makes a program embarrassingly parallel. They also go into some of the more quantitative aspects, like calculating the efficiency of the parallel design, and the serial overhead.

Most of the content is structured in the form of Patterns (hence the title), which I found to be an unusual way of presenting the information. However, the writing is clear, and easily understood. The examples are well though out, and clearly demonstrate the points being made.

The three APIs used for the examples cover the major types of parallel programming environments — explicit threading (Java), message passing (MPI), and implicit threading from high-level constructs (OpenMP). Other threading environments generally fall into one of these categories, so it is usually straightforward to see how descriptions can be extended to other environments for parallel programming.

The authors are clearly coming from a high-performance computing background, with massively parallel computers, but HyperThreading and dual-core CPUs are becoming common on desktops, and many of the same issues apply when writing code to exploit the capabilities of these machines.

Highly Recommended. Everyone writing parallel or multi-threaded programs should read this book.

Buy this book

Patterns for Parallel Programming
Timothy G. Mattson, Beverly A. Sanders and Berna L. Massingill
Published by Addison-Wesley
ISBN 0-321-22811-1

Buy from Amazon.co.uk
Buy from Amazon.com

Posted by Anthony Williams
[/ reviews /] permanent link
Tags: reviews, threads, patterns, books
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

10 Years of Programming with POSIX Threads

Monday, 29 October 2007

Cover Image for Programming with POSIX Threads

David Butenhof's Programming with POSIX Threads was published 10 years ago, in 1997. At the time, it was the definitive work on the POSIX thread API, and multi-threaded programming in general. Ten years is a long time in computing so how does it fare today?

New POSIX Standard

When the book was written, the latest version of the POSIX Standard was the 1996 edition (ISO/IEC 9945-1:1996). Since then, the standard has evolved. It is now maintained by a joint working group from The Open Group, the IEEE and ISO called The Austin Group. The new Standard is called the Single Unix Specification, Version 3 and the 2004 edition is available online.

The new standard has brought a few changes with it — many things that were part of extensions such as POSIX 1003.1j are now part of the main ISO Standard. This includes barriers and read-write locks, though barriers are still optional and the read-write locks have a slightly different interface. Programming with POSIX threads is therefore lacking a good description of the now-standard APIs — although Butenhof devotes a section in Chapter 7 to implementing read-write locks, this is now only of historical interest, as the semantics are different from those in the new standard.

Most things stay the same

Though there are inevitably some changes with the new standard, most of the APIs remain the same. Not only that, the fundamental concepts described in the book haven't changed — threads still work the same way, mutexes and condition variables still work the same way, and so forth. Not only that, but the rising numbers of multicore CPU desktop computers means that correct thread synchronization is more important than ever. Faulty assumptions about memory visibility that happened to be true for single core machines are often demonstrably false for multicore and multiprocessor machines, so the dangers of deadlock, livelock and race conditions are ever more present.

Still the definitive reference

Though it's probably worth downloading the new POSIX standard, or checking the man pages for the new functions, Programming with POSIX Threads is still a good reference to the POSIX thread APIs, and multi-threaded programming in general. It sits well alongside Patterns for Parallel Programming — whereas Patterns for Parallel Programming is mainly about designing programs for concurrency, Programming with POSIX Threads is very much focused on getting the implementation details right.

Highly Recommended.

Buy this book

Programming with POSIX Threads
David Butenhof
Published by Addison-Wesley
ISBN 0-201-63392-2

Buy from Amazon.co.uk
Buy from Amazon.com

Posted by Anthony Williams
[/ reviews /] permanent link
Tags: reviews, threads, POSIX, Butenhof, books
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Using CSS to Replace Text with Images

Monday, 29 October 2007

Lots has been said about ways to replace text with images so that users with a graphical browser get a nice pretty logo, whilst search engines and screen readers get to see the text version. Most recently, Eric Enge has posted A Comprehensive Guide to Hidden Text & Search Engines over at SEOmoz. In general, I think it's a fair summary of the techniques I've encountered.

However, I was surprised to see the order of entries in the "may be OK" list. Firstly, I'd have expected sIFR to be top of the list — this is a widely used technique, and just replaces existing text with the same text in a different font. I prefer to do without Flash where possible, and this only works where you want to change the font rather than use a logo, but I can certainly see the draw here.

Secondly, I was also surprised to see the suggestion that is top of the list is to position the text off screen. I think this is a really bad idea, for accessibility reasons. When I only had a dial-up connection, I often used to browse with images turned off in order to reduce download times. If the text is positioned off screen, I would have just got a blank space. Even now, I often check websites with images turned off, because I think it is important. It is for this reason that my preferred technique is "Fahrner Image Replacement" (FIR). Whilst Eric says this is a no-no according to the Google Guidelines, I can't really see how — it's not deceptive in intent, and the text is seen by users without image support (or with images turned off) as well as the search engine bots. Also, given the quote from Susan Moskwa, it seems fine. Here's a quick summary of how it works:

Overlaying text with an image in CSS

The key to this technique is to have a nested SPAN with no content, position it over the text, and set a background image on it. If the background image loads, it hides the original text.

<h1 id="title"><span></span>Some Title Text</h1>

It is important to set the size of the enclosing tag to match the image, so that the hidden text doesn't leak out round the edges at large font sizes. The CSS is simple:

#title
{
    position: relative;
    width: 200px;
    height: 100px;
    margin: 0px;
    padding: 0px;
    overflow: hidden;
}

#title span
{
    position: absolute;
    top: 0px;
    left: 0px;
    width: 200px;
    height: 100px;
    background-image: url(/images/title-image.png);
}

This simple technique works in all the major browsers, including Internet Explorer, and gracefully degrades. Obviously, you can't select text from the image, but you can generally select the hidden text (though it's hard to see what you're doing), and copying the whole page will include the hidden text. Check it out — how does the title above ("Overlaying text with an image in CSS") appear in your browser?

Update: It has been pointed out in a comment on the linked SEOmoz article by bjornjohansen that you need to be aware of the potential for browsers with a different font size. This is definitely important — that's why we specify the exact dimensions for the enclosing element, and use overflow: hidden to avoid overhang. It's also important to ensure that the raw text (without the image) fits the specified space when rendered in at least one font size larger than "normal", so that people who use larger fonts can still read it with images disabled, without getting the text clipped.

Update: In another comment over on the SEOmoz article, MarioFr suggested that for headings the A tag could be used instead of SPAN — since empty A tags can be used as a link target in the heading, it works as a suitable replacement. I've changed the heading above to use an A tag for both purposes as an example.

Posted by Anthony Williams
[/ webdesign /] permanent link
Tags: css, web design, image replacement
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Review of Fit for Developing Software by Rick Mugridge and Ward Cunningham

Monday, 22 October 2007

Cover Image for Fit For Developing Software

As the subtitle of this book says, Fit is the Framework for Integrated Tests, which was originally written by Ward. This is a testing framework that allows tests to be written in the form of Excel spreadsheets or HTML tables, which makes it easy for non-programmers to write tests. This book is divided into several parts. Parts 1 and 2 give an in-depth overview of how to use Fit effectively, and how it enables non-programmers to specify the tests, whereas parts 3-5 provide details that programmers will need for how to set up their code to be run from Fit.

Though I have been aware of Fit for a long time, I have never entirely grasped how to use it; reading this book gave me a strong urge to give it a go. It is very clear, with plenty of examples. I thought the sections on good/bad test structure, and how to restructure your tests to be clearer and easy to maintain were especially valuable — though they are obviously focused on Fit, many of the suggestions are applicable to testing through any framework.

Fit was developed as a Java framework, and so all the programming examples are in Java. However, as stated in the appendix, there are ports for many languages including C#, Python and C++. The way of structuring the fixtures that link the Fit tests to the code under test varies with each language, but the overall principles still apply.

The book didn't quite succeed in convincing me to spend time working with Fit or Fitnesse to try and integrate it with any of my existing projects, but I still think it's worth a look, and will try and use it on my next greenfield project.

Recommended.

Buy this book

At Amazon.co.uk
At Amazon.com

Posted by Anthony Williams
[/ reviews /] permanent link
Tags: reviews, fit, books, testing
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Reduce Bandwidth Usage by Compressing Pages in PHP

Monday, 15 October 2007

In Reduce Bandwidth Usage by Supporting If-Modified-Since in PHP, I identified one way to reduce your bandwidth usage — use the appropriate HTTP headers to avoid sending content that hasn't changed. Another way to reduce your bandwidth usage is to compress your pages.

HTTP headers

The Accept-Encoding HTTP header is used by browsers to specify potential encodings for a requested web page. For Firefox, this is generally set to "gzip, deflate", meaning that the browser will accept (and decompress) web pages compressed with the gzip or deflate compression algorithms. The web server can then use the Content-Encoding header to indicate that it has used a particular encoding for the served page. The Vary header is used to tell the browser or proxy that different encodings can be used. For example, if the server compresses the page using gzip, then it will return headers that say

    Content-Encoding: gzip
    Vary: Accept-Encoding

Handling compression in PHP

For static pages, compression is handled by your web server (though you might have to configure it to do so). For pages generated with PHP you are in charge. However, supporting compression is really easy. Just add:

    ob_start('ob_gzhandler');

to the start of the script. It is important that this comes before any output has been written as in order to compress the output, all output has to be passed through the filter, and the headers have to be set. If any content has already been sent to the browser, then this won't work, which is why I put it at the start of the script — that way, there's not much chance of anything interfering.

Tags: PHP, web design, HTTP, compression, reducing bandwidth

Posted by Anthony Williams
[/ webdesign /] permanent link
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Reduce Bandwidth Usage by Supporting If-Modified-Since in PHP

Sunday, 30 September 2007

By default, pages generated with PHP are not cached by browsers or proxies, as they are generated anew every time the page is loaded by the server. If you have repeat visitors to your website, or even many visitors that use the same proxy, this means that a lot of bandwidth is wasted transferring content that hasn't changed since last time. By adding appropriate code to your PHP pages, you can allow your pages to be cached, and reduce the required bandwidth.

As Bruce Eckel points out in RSS: The Wrong Solution to a Broken Internet, this is a particular problem for RSS feeds — feed readers are often overly enthusiastic in their checking rate, and given the tendency of bloggers to provide full feeds this can lead to a lot of wasted bandwidth. By using the code from this article in your feed-generating code you can save yourself a whole lot of bandwidth.

Caching and HTTP headers

Whenever a page is requested by a browser, the server response includes a Last-Modified header in the response which indicates the last modification time. For static pages, this is the last modification time of the file, but for dynamic pages it typically defaults to the time the page was requested. Whenever a page is requested that has been seen before, browsers or proxies generally take the Last-Modified time from the cached version and populate an If-Modified-Since request header with it. If the page has not changed since then, the server should respond with a 304 response code to indicate that the cached version is still valid, rather than sending the page content again.

To handle this correctly for PHP pages requires two things:

Identifying the last modification time for the page, and
Checking the request headers for the If-Modified-Since.

Timestamps

There are two components to the last modification time: the date of the data used to generate the page, and the date of the script itself. Both are equally important, as we want the page to be updated when the data changes, and if the script has been changed the generated page may be different (for example, the layout could be different). My PHP code incorporates both by defaulting the modification time of the script, and allowing the user to pass in the data modification time, which is used if it is more recent than the script. The last modification time is then used to generate a Last-Modified header, and returned to the caller. Here is the function that adds the Last-Modified header. It uses both getlastmod() and filemtime(__FILE__) to determine the script modification time, on the assumption that this function is in a file included from the main script, and we want to detect changes to either.

function setLastModified($last_modified=NULL)
{
    $page_modified=getlastmod();
    
    if(empty($last_modified) || ($last_modified < $page_modified))
    {
        $last_modified=$page_modified;
    }
    $header_modified=filemtime(__FILE__);
    if($header_modified > $last_modified)
    {
        $last_modified=$header_modified;
    }
    header('Last-Modified: ' . date("r",$last_modified));
    return $last_modified;
}

Handling `If-Modified-Since`

If the If-Modified-Since request header is present, then it can be parsed to get a timestamp that can be compared against the modification time. If the modification time is older than the request time, a 304 response can be returned instead of generating the page.

In PHP, the HTTP request headers are generally stored in the $_SERVER superglobal with a name starting with HTTP_ based on the header name. For our purposes, we need the HTTP_IF_MODIFIED_SINCE entry, which corresponds to the If-Modified-Since header. We can check for this with array_key_exists, and parse the date with strtotime. There's a slight complication in that old browsers used to add additional data to this header, separated with a semicolon, so we need to strip that out (using preg_replace) before parsing. If the header is present, and the specified date is more recent than the last-modified time, we can just return the 304 response code and quit — no further output required. Here is the function that handles this:

function exitIfNotModifiedSince($last_modified)
{
    if(array_key_exists("HTTP_IF_MODIFIED_SINCE",$_SERVER))
    {
        $if_modified_since=strtotime(preg_replace('/;.*$/','',$_SERVER["HTTP_IF_MODIFIED_SINCE"]));
        if($if_modified_since >= $last_modified)
        {
            header("HTTP/1.0 304 Not Modified");
            exit();
        }
    }
}

Putting it all together

Using the two functions together is really simple:

     exitIfNotModifiedSince(setLastModified()); // for pages with no data-dependency
     exitIfNotModifiedSince(setLastModified($data_modification_time)); // for data-dependent pages

Of course, you can use the functions separately if that better suits your needs.

Posted by Anthony Williams
[/ webdesign /] permanent link
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

More recent entries Older entries

About Us

Technical Writings

Subscribe to Blog

Blog Archives

Blog Archive

Monday, 03 December 2007

Popular browsers

Which version?

Don't forget a text-only browser

Multiple Browsers on the same machine

Snapshot services

Tuesday, 27 November 2007

Virtual Networks

Get Started with a Pre-built Virtual Machine

Warning: Virtual Machines use Real Resources

Alternatives

Monday, 19 November 2007

Save Space on Foreign Keys

Changing the master data

Changing the master columns

Simplify your code

Conclusion

Related Posts

Monday, 12 November 2007

Factors affecting the elegance of software

Thursday, 01 November 2007

Buy this book

Monday, 29 October 2007

New POSIX Standard

Most things stay the same

Still the definitive reference

Buy this book

Monday, 29 October 2007

Overlaying text with an image in CSS

Monday, 22 October 2007

Buy this book

Monday, 15 October 2007

HTTP headers

Handling compression in PHP

Sunday, 30 September 2007

Caching and HTTP headers

Timestamps

Handling If-Modified-Since

Putting it all together

Handling `If-Modified-Since`