Just Software Solutions

Blog Archive for / 2007 / 09 /

Reduce Bandwidth Usage by Supporting If-Modified-Since in PHP

Sunday, 30 September 2007

By default, pages generated with PHP are not cached by browsers or proxies, as they are generated anew every time the page is loaded by the server. If you have repeat visitors to your website, or even many visitors that use the same proxy, this means that a lot of bandwidth is wasted transferring content that hasn't changed since last time. By adding appropriate code to your PHP pages, you can allow your pages to be cached, and reduce the required bandwidth.

As Bruce Eckel points out in RSS: The Wrong Solution to a Broken Internet, this is a particular problem for RSS feeds — feed readers are often overly enthusiastic in their checking rate, and given the tendency of bloggers to provide full feeds this can lead to a lot of wasted bandwidth. By using the code from this article in your feed-generating code you can save yourself a whole lot of bandwidth.

Caching and HTTP headers

Whenever a page is requested by a browser, the server response includes a Last-Modified header in the response which indicates the last modification time. For static pages, this is the last modification time of the file, but for dynamic pages it typically defaults to the time the page was requested. Whenever a page is requested that has been seen before, browsers or proxies generally take the Last-Modified time from the cached version and populate an If-Modified-Since request header with it. If the page has not changed since then, the server should respond with a 304 response code to indicate that the cached version is still valid, rather than sending the page content again.

To handle this correctly for PHP pages requires two things:

  • Identifying the last modification time for the page, and
  • Checking the request headers for the If-Modified-Since.

Timestamps

There are two components to the last modification time: the date of the data used to generate the page, and the date of the script itself. Both are equally important, as we want the page to be updated when the data changes, and if the script has been changed the generated page may be different (for example, the layout could be different). My PHP code incorporates both by defaulting the modification time of the script, and allowing the user to pass in the data modification time, which is used if it is more recent than the script. The last modification time is then used to generate a Last-Modified header, and returned to the caller. Here is the function that adds the Last-Modified header. It uses both getlastmod() and filemtime(__FILE__) to determine the script modification time, on the assumption that this function is in a file included from the main script, and we want to detect changes to either.

function setLastModified($last_modified=NULL)
{
    $page_modified=getlastmod();
    
    if(empty($last_modified) || ($last_modified < $page_modified))
    {
        $last_modified=$page_modified;
    }
    $header_modified=filemtime(__FILE__);
    if($header_modified > $last_modified)
    {
        $last_modified=$header_modified;
    }
    header('Last-Modified: ' . date("r",$last_modified));
    return $last_modified;
}

Handling If-Modified-Since

If the If-Modified-Since request header is present, then it can be parsed to get a timestamp that can be compared against the modification time. If the modification time is older than the request time, a 304 response can be returned instead of generating the page.

In PHP, the HTTP request headers are generally stored in the $_SERVER superglobal with a name starting with HTTP_ based on the header name. For our purposes, we need the HTTP_IF_MODIFIED_SINCE entry, which corresponds to the If-Modified-Since header. We can check for this with array_key_exists, and parse the date with strtotime. There's a slight complication in that old browsers used to add additional data to this header, separated with a semicolon, so we need to strip that out (using preg_replace) before parsing. If the header is present, and the specified date is more recent than the last-modified time, we can just return the 304 response code and quit — no further output required. Here is the function that handles this:

function exitIfNotModifiedSince($last_modified)
{
    if(array_key_exists("HTTP_IF_MODIFIED_SINCE",$_SERVER))
    {
        $if_modified_since=strtotime(preg_replace('/;.*$/','',$_SERVER["HTTP_IF_MODIFIED_SINCE"]));
        if($if_modified_since >= $last_modified)
        {
            header("HTTP/1.0 304 Not Modified");
            exit();
        }
    }
}

Putting it all together

Using the two functions together is really simple:

     exitIfNotModifiedSince(setLastModified()); // for pages with no data-dependency
     exitIfNotModifiedSince(setLastModified($data_modification_time)); // for data-dependent pages

Of course, you can use the functions separately if that better suits your needs.

Posted by Anthony Williams
[/ webdesign /] permanent link
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Free SEO Tools for Webmasters

Monday, 24 September 2007

I thought I'd share some of the free online tools that I use for assisting with Search Engine Optimization.

The people behind the iWebTool Directory provide a set of free webmaster tools, including a Broken Link Checker, a Backlink Checker and their Rank Checker. For most tools, just enter your domain or URL in the box, click "Check!" and wait for the results.

Whereas the iWebTool tools each perform one small task, Website Grader is an all-in-one tool for grading your website. Put in your URL, the keywords you wish to rank well for, and the websites of your competitors (if you wish for a comparison). When you submit your site, the tool then displays its progress at the bottom of the page, and after a few moments will give a report on your website, including your PageRank, Alexa rank, inbound links and Google rankings for you and your competitors for the search terms you provided, as well as a quick analysis of the content of your page.

We Build Pages offers a suite of SEO tools, much like the ones from iWebTool. I find the Top Ten Analysis SEO Tool really useful, as it compares your site against the top ten ranking sites for the search term you specify. The Backlink and Anchor Text Tool is also pretty good — it takes a while, but eventually tells you which pages link to your site, and what anchor text they use for the link.

Posted by Anthony Williams
[/ webdesign /] permanent link
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Using Interfaces for Exception Safety in Delphi

Thursday, 20 September 2007

Resource Management and Exception Safety

One concept that has become increasingly important when writing C++ code is that of Exception Safety — writing code so that invariants are maintained even if an exception is thrown. Since exceptions are also supported in Delphi, it makes sense to apply many of the same techniques to Delphi code.

One of the important aspects of exception safety is resource management — ensuring that resources are correctly freed even in the presence of exceptions, in order to avoid memory leaks or leaking of other more expensive resources such as file handles or database connection handles. Probably the most common resource management idiom in C++ is Resource Acquisition is Initialization (RAII). As you may guess from the name, this involves acquiring resources in the constructor of an object. However, the important part is that the resource is released in the destructor. In C++, objects created on the stack are automatically destroyed when they go out of scope, so this idiom works well — if an exception is thrown, then the local objects are destroyed (and thus the resources they own are released) as part of stack unwinding.

In Delphi, things are not quite so straight-forward: variables of class type are not stack allocated, and must be explicitly constructed by calling the constructor, and explicitly destroyed — in this respect, they are very like raw pointers in C++. This commonly leads to lots of try-finally blocks to ensure that variables are correctly destroyed when they go out of scope.

Delphi Interfaces

However, there is one type of Delphi variable that is automatically destroyed when it goes out of scope — an interface variable. Delphi interfaces behave very much like reference-counted pointers (such as boost::shared_ptr) in this regard — largely because they are used to support COM, which requires this behaviour. When an object is assigned to an interface variable, the reference count is increased by one. When the interface variable goes out of scope, or is assigned a new value, the reference count is decreased by one, and the object is destroyed when the reference count reaches zero. So, if you declare an interface for your class and use that interface type exclusively, then you can avoid all these try-finally blocks. Consider:

type
    abc = class
        constructor Create;
        destructor Destroy; override;
        procedure do_stuff;
    end;

procedure other_stuff;

...

procedure foo;
var
    x,y: abc;
begin
    x := abc.Create;
    try
        y := abc.Create;
        try
            x.do_stuff;
            other_stuff;
            y.do_stuff;
        finally
            y.Free;
        end;
    finally
        x.Free;
    end;
end;

All that try-finally machinery can seriously impact the readability of the code, and is easy to forget. Compare it with:

type
    Idef = interface
        procedure do_stuff;
    end;
    def = class(TInterfacedObject, Idef)
        constructor Create;
        destructor Destroy; override;
        procedure do_stuff;
    end;

procedure other_stuff;

...

procedure foo;
var
    x,y: Idef;
begin
    x := def.Create;
    y := def.Create;
    x.do_stuff;
    other_stuff;
    y.do_stuff;
end;

Isn't the interface-based version easier to read? Not only that, but in many cases you no longer have to worry about lifetime issues of objects returned from functions — the compiler takes care of ensuring that the reference count is kept up-to-date and the object is destroyed when it is no longer used. Of course, you still need to make sure that the code behaves appropriately in the case of exceptions, but this little tool can go a long way towards that.

Further Benefits

Not only do you get the benefit of automatic destruction when you use an interface to manage the lifetime of your class object, but you also get further benefits in the form of code isolation. The class definition can be moved into the implementation section of a unit, so that other code that uses this unit isn't exposed to the implementation details in terms of private methods and data. Not only that, but if the private data is of a type not exposed in the interface, you might be able to move a unit from the uses clause of the interface section to the implementation section. The reduced dependencies can lead to shorter compile times.

Another property of using an interface is that you can now provide alternative implementations of this interface. This can be of great benefit when testing, since it allows you to substitute a dummy test-only implementation when testing other code that uses this interface. In particular, you can write test implementations that return fixed values, or record the method calls made and their parameters.

Downsides

The most obvious downside is the increased typing required for the interface definition — all the public properties and methods of the class have to be duplicated in the interface. This isn't a lot of typing, except for really big classes, but it does mean that there are two places to update if the method signatures change or a new method is added. In the majority of cases, I think this is outweighed by the benefit of isolation and separation of concerns achieved by using the interface.

Another downside is the requirement to derive from TInterfacedObject. Whilst you can implement the IInterface methods yourself, unless you have good reason to then it is strongly recommended to inherit from TInterfacedObject. One such "good reason" is that the class in question already inherits from another class, which doesn't derive from TInterfacedObject. In this case, you have no choice but to implement the functions yourself, which is tedious. One possibility is to create a data member rather than inherit from the problematic class, but that doesn't always make sense — you have to decide for yourself in each case. Sometimes the benefits of using an interface are not worth the effort.

As Sidu Ponnappa does in his post 'Programming to interfaces' strikes again, "programming to interfaces" doesn't mean to create an interface for every class, which does seem to be what I am proposing here. Whilst I agree with the idea, I think the benefits of using interfaces outweigh the downsides in many cases, for the reasons outlined above.

A Valuable Tool in the Toolbox

Whilst this is certainly not applicable in all cases, I have found it a useful tool when writing Delphi code, and will continue to use it where it helps simplify code. Though this article has focused on the exception safety aspect of using interfaces, I find the testing aspects particularly compelling when writing DUnit tests.

Posted by Anthony Williams
[/ delphi /] permanent link
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Intel and AMD Define Memory Ordering

Monday, 17 September 2007

For a long time, the ordering of memory accesses between processors in a multi-core or multi-processor system based on the Intel x86 architecture has been under specified. Many newsgroup posts have discussed the interpretation of the Intel and AMD software developer manuals, and how that translates to actual guarantees, but there has been nothing authoritative, despite comments from Intel engineers. This has now changed! Both Intel and AMD have now released documentation of their memory ordering guarantees — Intel has published a new white paper (Intel 64 Architecture Memory Ordering White Paper) devoted to the issue, whereas AMD have updated their programmer's manual (Section 7.2 of AMD64 Architecture Programmer's Manual Volume 2: System Programming Rev 3.13).

In particular, there are a couple of things that a now made explicitly clear by this documentation:

  • Stores from a single processor cannot be reordered, and
  • Memory accesses obey causal consistency, so
  • An aligned load is an acquire operation, and
  • An aligned store is a release operation, and
  • A locked instruction (such as xchg or lock cmpxchg) is both an acquire and a release operation.

This has implications for the implementation of threading primitives such as mutexes for IA-32 and Intel 64 architectures — in some cases the code can be simplified, where it has been written to take a pessimistic interpretation of the specifications.

Posted by Anthony Williams
[/ threading /] permanent link
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

New Papers for C++ Standards Committee

Tuesday, 11 September 2007

I've just added the most recent papers that I've submitted to the C++ Standards Committee to our publications page. Mostly these are on multi-threading in C++:

but there's also an updated to my old paper on Names, Linkage and Templates (Rev 2), with new proposed wording for the C++ Standard now that the Evolution Working Group have approved the proposal in principle, and it has move to the Core Working Group for final approval and incorporation into the Standard.

Posted by Anthony Williams
[/ news /] permanent link
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Database Tip: Use Parameterized Queries

Monday, 03 September 2007

This post is the third in a series of Database Tips.

When running an application with a database backend, a high percentage of SQL statements are likely to have variable data. This might be data obtained from a previous query, or it might be data entered by the user. In either case, you've got to somehow combine this variable data with the fixed SQL string.

String Concatenation

One possibility is just to incorporate the data into the SQL statement directly, using string concatenation, but this has two potential problems. Firstly, this means that the actual SQL statement parsed by the database is different every time. Many databases can skip parsing for repeated uses of the same SQL statement, so by using a different statement every time there is a performance hit. Secondly, and more importantly, this places the responsibility on you to ensure that the variable data will behave correctly as part of the statement. This is particularly important for web-based applications, as a common attack used by crackers is a "SQL injection" attack — by taking advantage of poor quoting by the application when generating SQL statements, it is possible to input data which will end the current SQL statement, and start a new one of the cracker's choosing. For example, if string data is just quoted in the SQL using plain quotes ('data') then data that contains a quote and a semicolon will end the statement. This means that if data is '; update login_table set password='abc'; then the initial '; will end the statement from the application, and the database will then run the next one, potentially setting everyone's password to "abc".

Parameterized Queries

A solution to both these problems can be found in the form of Parameterized Queries. In a parameterized query, the variable data in the SQL statement is replaced with a placeholder such as a question mark, which indicates to the database engine that this is a parameter. The database API can then be used to set the parameters before executing the query. This neatly solves the first problem with string concatenation — the query seen by the database engine is the same every time, so giving the database the opportunity to avoid parsing the statement every time. Most parameterized query APIs will also allow you to reuse the same query with multiple sets of parameters, thus explicitly caching the parsed query.

Parameterized queries also solve the SQL injection problem — most APIs can send the data directly to the database engine, marked as a parameter rather than quoting it. Even when the data is quoted within the API, this is then the database driver's responsibility, and is thus more likely to be reliable. In either case, the user is relinquished from the requirement of correctly quoting the data, thus avoiding SQL injection attacks.

A third benefit of parameterized queries is that data doesn't have to be converted to a string representation. This means that, for example, floating point numbers can be correctly transferred to the database without first converting to a potentially inaccurate string representation. It also means that the statement might run slightly faster, as the string representation of data often requires more storage than the data itself.

The Parameterized Query Coding Model

Whereas running a simple SQL statement consists of just two parts — execute the statement, optionally retrieve the results — using parameterized queries ofetn requires five:

  1. Parse the statement (often called preparing the statement.)
  2. Bind the parameter values to the parameters.
  3. Execute the statement.
  4. Optionally, retrieve the results
  5. Close or finalize the statement.

The details of each step depends on the particular database API, but most APIs follow the same outline. In particular, as mentioned above, most APIs allow you to run steps 2 to 4 several times before running step 5.

Placeholders

A parameterized query includes placeholders for the actual data to be passed in. In the simplest form, these placeholders can often be just a question mark (e.g. SELECT name FROM customers WHERE id=?), but most APIs also allow for named placeholders by prefixing an identifier with a marker such as a colon or an at-sign (e.g. INSERT INTO books (title,author) VALUES (@title,@author)). The use of named placeholders can be beneficial when the same data is needed in multiple parts of the query — rather than binding the data twice, you just use the same placeholder name. Named placeholders are also easier to get right in the face of SQL statements with large numbers of parameters or if the SQL statement is changed — it is much easier to ensure that the correct data is associated with a particular named parameter, than to ensure that it is associated with the correctly-numbered parameter, as it is easy to lose count of parameters, or change their order when changing the SQL statement.

Recommendations

Look up the API for parameterized queries for your database. In SQLite, it the APIs surrounding sqlite3_stmt, for MySQL it's the Prepared Statements API, and for Oracle the OCI parameterized statements API does the trick.

If your database API supports it, used named parameters, or at least explicit numbering (e.g. ?1,?2,?3 rather than just ?,?,?) to help avoid errors.

Posted by Anthony Williams
[/ database /] permanent link
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Previous Entries Later Entries

Design and Content Copyright © 2005-2025 Just Software Solutions Ltd. All rights reserved. | Privacy Policy