Blog Archive for / 2007 / 09 /
Reduce Bandwidth Usage by Supporting If-Modified-Since in PHP
Sunday, 30 September 2007
By default, pages generated with PHP are not cached by browsers or proxies, as they are generated anew every time the page is loaded by the server. If you have repeat visitors to your website, or even many visitors that use the same proxy, this means that a lot of bandwidth is wasted transferring content that hasn't changed since last time. By adding appropriate code to your PHP pages, you can allow your pages to be cached, and reduce the required bandwidth.
As Bruce Eckel points out in RSS: The Wrong Solution to a Broken Internet, this is a particular problem for RSS feeds — feed readers are often overly enthusiastic in their checking rate, and given the tendency of bloggers to provide full feeds this can lead to a lot of wasted bandwidth. By using the code from this article in your feed-generating code you can save yourself a whole lot of bandwidth.
Caching and HTTP headers
Whenever a page is requested by a browser, the server response includes a Last-Modified
header in the response which
indicates the last modification time. For static pages, this is the last modification time of the file, but for dynamic pages it
typically defaults to the time the page was requested. Whenever a page is requested that has been seen before, browsers or proxies
generally take the Last-Modified
time from the cached version and populate an If-Modified-Since
request
header with it. If the page has not changed since then, the server should respond with a 304 response code to
indicate that the cached version is still valid, rather than sending the page content again.
To handle this correctly for PHP pages requires two things:
- Identifying the last modification time for the page, and
- Checking the request headers for the
If-Modified-Since
.
Timestamps
There are two components to the last modification time: the date of the data used to generate the page, and the date of the
script itself. Both are equally important, as we want the page to be updated when the data changes, and if the script has been
changed the generated page may be different (for example, the layout could be different). My PHP code incorporates both by
defaulting the modification time of the script, and allowing the user to pass in the data modification time, which is used if it is
more recent than the script. The last modification time is then used to generate a Last-Modified
header, and returned
to the caller. Here is the function that adds the Last-Modified
header. It uses both getlastmod()
and
filemtime(__FILE__)
to determine the script modification time, on the assumption that this function is in a file
included from the main script, and we want to detect changes to either.
function setLastModified($last_modified=NULL) { $page_modified=getlastmod(); if(empty($last_modified) || ($last_modified < $page_modified)) { $last_modified=$page_modified; } $header_modified=filemtime(__FILE__); if($header_modified > $last_modified) { $last_modified=$header_modified; } header('Last-Modified: ' . date("r",$last_modified)); return $last_modified; }
Handling If-Modified-Since
If the If-Modified-Since
request header is present, then it can be parsed to get a timestamp that can be compared
against the modification time. If the modification time is older than the request time, a 304 response can be
returned instead of generating the page.
In PHP, the HTTP request headers are generally stored in the $_SERVER
superglobal with a name starting with
HTTP_
based on the header name. For our purposes, we need the HTTP_IF_MODIFIED_SINCE
entry, which
corresponds to the If-Modified-Since
header. We can check for this with array_key_exists
, and parse the
date with strtotime
. There's a slight complication in that old browsers used to add additional data to this header,
separated with a semicolon, so we need to strip that out (using preg_replace
) before parsing. If the header is present,
and the specified date is more recent than the last-modified time, we can just return the 304 response code and
quit — no further output required. Here is the function that handles this:
function exitIfNotModifiedSince($last_modified) { if(array_key_exists("HTTP_IF_MODIFIED_SINCE",$_SERVER)) { $if_modified_since=strtotime(preg_replace('/;.*$/','',$_SERVER["HTTP_IF_MODIFIED_SINCE"])); if($if_modified_since >= $last_modified) { header("HTTP/1.0 304 Not Modified"); exit(); } } }
Putting it all together
Using the two functions together is really simple:
exitIfNotModifiedSince(setLastModified()); // for pages with no data-dependency exitIfNotModifiedSince(setLastModified($data_modification_time)); // for data-dependent pages
Of course, you can use the functions separately if that better suits your needs.
Posted by Anthony Williams
[/ webdesign /] permanent link
Stumble It! | Submit to Reddit
| Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Free SEO Tools for Webmasters
Monday, 24 September 2007
I thought I'd share some of the free online tools that I use for assisting with Search Engine Optimization.
The people behind the iWebTool Directory provide a set of free webmaster tools, including a Broken Link Checker, a Backlink Checker and their Rank Checker. For most tools, just enter your domain or URL in the box, click "Check!" and wait for the results.
Whereas the iWebTool tools each perform one small task, Website Grader is an all-in-one tool for grading your website. Put in your URL, the keywords you wish to rank well for, and the websites of your competitors (if you wish for a comparison). When you submit your site, the tool then displays its progress at the bottom of the page, and after a few moments will give a report on your website, including your PageRank, Alexa rank, inbound links and Google rankings for you and your competitors for the search terms you provided, as well as a quick analysis of the content of your page.
We Build Pages offers a suite of SEO tools, much like the ones from iWebTool. I find the Top Ten Analysis SEO Tool really useful, as it compares your site against the top ten ranking sites for the search term you specify. The Backlink and Anchor Text Tool is also pretty good — it takes a while, but eventually tells you which pages link to your site, and what anchor text they use for the link.
Posted by Anthony Williams
[/ webdesign /] permanent link
Stumble It! | Submit to Reddit
| Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Using Interfaces for Exception Safety in Delphi
Thursday, 20 September 2007
Resource Management and Exception Safety
One concept that has become increasingly important when writing C++ code is that of Exception Safety — writing code so that invariants are maintained even if an exception is thrown. Since exceptions are also supported in Delphi, it makes sense to apply many of the same techniques to Delphi code.
One of the important aspects of exception safety is resource management — ensuring that resources are correctly freed even in the presence of exceptions, in order to avoid memory leaks or leaking of other more expensive resources such as file handles or database connection handles. Probably the most common resource management idiom in C++ is Resource Acquisition is Initialization (RAII). As you may guess from the name, this involves acquiring resources in the constructor of an object. However, the important part is that the resource is released in the destructor. In C++, objects created on the stack are automatically destroyed when they go out of scope, so this idiom works well — if an exception is thrown, then the local objects are destroyed (and thus the resources they own are released) as part of stack unwinding.
In Delphi, things are not quite so straight-forward: variables of class type are not stack allocated, and must be explicitly
constructed by calling the constructor, and explicitly destroyed — in this respect, they are very like raw pointers in
C++. This commonly leads to lots of try
-finally
blocks to ensure that variables are correctly destroyed
when they go out of scope.
Delphi Interfaces
However, there is one type of Delphi variable that is automatically destroyed when it goes out of scope — an
interface
variable. Delphi interfaces behave very much like reference-counted pointers (such as
boost::shared_ptr
) in this regard — largely because they are used to support COM, which requires this
behaviour. When an object is assigned to an interface variable, the reference count is increased by one. When the interface variable
goes out of scope, or is assigned a new value, the reference count is decreased by one, and the object is destroyed when the
reference count reaches zero. So, if you declare an interface
for your class and use that interface type exclusively,
then you can avoid all these try
-finally
blocks. Consider:
type abc = class constructor Create; destructor Destroy; override; procedure do_stuff; end; procedure other_stuff; ... procedure foo; var x,y: abc; begin x := abc.Create; try y := abc.Create; try x.do_stuff; other_stuff; y.do_stuff; finally y.Free; end; finally x.Free; end; end;
All that try
-finally
machinery can seriously impact the readability of the code, and is easy to
forget. Compare it with:
type Idef = interface procedure do_stuff; end; def = class(TInterfacedObject, Idef) constructor Create; destructor Destroy; override; procedure do_stuff; end; procedure other_stuff; ... procedure foo; var x,y: Idef; begin x := def.Create; y := def.Create; x.do_stuff; other_stuff; y.do_stuff; end;
Isn't the interface
-based version easier to read? Not only that, but in many cases you no longer have to worry about
lifetime issues of objects returned from functions — the compiler takes care of ensuring that the reference count is kept
up-to-date and the object is destroyed when it is no longer used. Of course, you still need to make sure that the code behaves
appropriately in the case of exceptions, but this little tool can go a long way towards that.
Further Benefits
Not only do you get the benefit of automatic destruction when you use an interface to manage the lifetime of your class object,
but you also get further benefits in the form of code isolation. The class definition can be moved into the
implementation
section of a unit, so that other code that uses this unit isn't exposed to the implementation details in
terms of private methods and data. Not only that, but if the private data is of a type not exposed in the interface
,
you might be able to move a unit from the uses
clause of the interface
section to the
implementation
section. The reduced dependencies can lead to shorter compile times.
Another property of using an interface is that you can now provide alternative implementations of this interface. This can be of great benefit when testing, since it allows you to substitute a dummy test-only implementation when testing other code that uses this interface. In particular, you can write test implementations that return fixed values, or record the method calls made and their parameters.
Downsides
The most obvious downside is the increased typing required for the interface definition — all the public properties and methods of the class have to be duplicated in the interface. This isn't a lot of typing, except for really big classes, but it does mean that there are two places to update if the method signatures change or a new method is added. In the majority of cases, I think this is outweighed by the benefit of isolation and separation of concerns achieved by using the interface.
Another downside is the requirement to derive from TInterfacedObject
. Whilst you can implement the
IInterface
methods yourself, unless you have good reason to then it is strongly recommended to inherit from
TInterfacedObject
. One such "good reason" is that the class in question already inherits from another class, which
doesn't derive from TInterfacedObject
. In this case, you have no choice but to implement the functions yourself, which
is tedious. One possibility is to create a data member rather than inherit from the problematic class, but that doesn't always make
sense — you have to decide for yourself in each case. Sometimes the benefits of using an interface are not worth the
effort.
As Sidu Ponnappa does in his post 'Programming to interfaces' strikes again, "programming to interfaces" doesn't mean to create an interface for every class, which does seem to be what I am proposing here. Whilst I agree with the idea, I think the benefits of using interfaces outweigh the downsides in many cases, for the reasons outlined above.
A Valuable Tool in the Toolbox
Whilst this is certainly not applicable in all cases, I have found it a useful tool when writing Delphi code, and will continue to use it where it helps simplify code. Though this article has focused on the exception safety aspect of using interfaces, I find the testing aspects particularly compelling when writing DUnit tests.
Posted by Anthony Williams
[/ delphi /] permanent link
Stumble It! | Submit to Reddit
| Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Intel and AMD Define Memory Ordering
Monday, 17 September 2007
For a long time, the ordering of memory accesses between processors in a multi-core or multi-processor system based on the Intel x86 architecture has been under specified. Many newsgroup posts have discussed the interpretation of the Intel and AMD software developer manuals, and how that translates to actual guarantees, but there has been nothing authoritative, despite comments from Intel engineers. This has now changed! Both Intel and AMD have now released documentation of their memory ordering guarantees — Intel has published a new white paper (Intel 64 Architecture Memory Ordering White Paper) devoted to the issue, whereas AMD have updated their programmer's manual (Section 7.2 of AMD64 Architecture Programmer's Manual Volume 2: System Programming Rev 3.13).
In particular, there are a couple of things that a now made explicitly clear by this documentation:
- Stores from a single processor cannot be reordered, and
- Memory accesses obey causal consistency, so
- An aligned load is an acquire operation, and
- An aligned store is a release operation, and
- A locked instruction (such as
xchg
orlock cmpxchg
) is both an acquire and a release operation.
This has implications for the implementation of threading primitives such as mutexes for IA-32 and Intel 64 architectures — in some cases the code can be simplified, where it has been written to take a pessimistic interpretation of the specifications.
Posted by Anthony Williams
[/ threading /] permanent link
Stumble It! | Submit to Reddit
| Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
New Papers for C++ Standards Committee
Tuesday, 11 September 2007
I've just added the most recent papers that I've submitted to the C++ Standards Committee to our publications page. Mostly these are on multi-threading in C++:
- N2139 — Thoughts on a Thread Library for C++,
- N2276 — Thread Pools and Futures, and
- N2320 — Multi-threading library for Standard C++
but there's also an updated to my old paper on Names, Linkage and Templates (Rev 2), with new proposed wording for the C++ Standard now that the Evolution Working Group have approved the proposal in principle, and it has move to the Core Working Group for final approval and incorporation into the Standard.
Posted by Anthony Williams
[/ news /] permanent link
Stumble It! | Submit to Reddit
| Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Database Tip: Use Parameterized Queries
Monday, 03 September 2007
This post is the third in a series of Database Tips.
When running an application with a database backend, a high percentage of SQL statements are likely to have variable data. This might be data obtained from a previous query, or it might be data entered by the user. In either case, you've got to somehow combine this variable data with the fixed SQL string.
String Concatenation
One possibility is just to incorporate the data into the SQL statement directly, using string concatenation, but this has two
potential problems. Firstly, this means that the actual SQL statement parsed by the database is different every time. Many databases
can skip parsing for repeated uses of the same SQL statement, so by using a different statement every time there is a performance
hit. Secondly, and more importantly, this places the responsibility on you to ensure that the variable data will behave correctly as
part of the statement. This is particularly important for web-based applications, as a common attack used by crackers is a "SQL
injection" attack — by taking advantage of poor quoting by the application when generating SQL statements, it is possible to
input data which will end the current SQL statement, and start a new one of the cracker's choosing. For example, if string data is
just quoted in the SQL using plain quotes ('data'
) then data that contains a quote and a semicolon will end
the statement. This means that if data is '; update login_table set password='abc';
then the initial
';
will end the statement from the application, and the database will then run the next one, potentially setting
everyone's password to "abc".
Parameterized Queries
A solution to both these problems can be found in the form of Parameterized Queries. In a parameterized query, the variable data in the SQL statement is replaced with a placeholder such as a question mark, which indicates to the database engine that this is a parameter. The database API can then be used to set the parameters before executing the query. This neatly solves the first problem with string concatenation — the query seen by the database engine is the same every time, so giving the database the opportunity to avoid parsing the statement every time. Most parameterized query APIs will also allow you to reuse the same query with multiple sets of parameters, thus explicitly caching the parsed query.
Parameterized queries also solve the SQL injection problem — most APIs can send the data directly to the database engine, marked as a parameter rather than quoting it. Even when the data is quoted within the API, this is then the database driver's responsibility, and is thus more likely to be reliable. In either case, the user is relinquished from the requirement of correctly quoting the data, thus avoiding SQL injection attacks.
A third benefit of parameterized queries is that data doesn't have to be converted to a string representation. This means that, for example, floating point numbers can be correctly transferred to the database without first converting to a potentially inaccurate string representation. It also means that the statement might run slightly faster, as the string representation of data often requires more storage than the data itself.
The Parameterized Query Coding Model
Whereas running a simple SQL statement consists of just two parts — execute the statement, optionally retrieve the results — using parameterized queries ofetn requires five:
- Parse the statement (often called preparing the statement.)
- Bind the parameter values to the parameters.
- Execute the statement.
- Optionally, retrieve the results
- Close or finalize the statement.
The details of each step depends on the particular database API, but most APIs follow the same outline. In particular, as mentioned above, most APIs allow you to run steps 2 to 4 several times before running step 5.
Placeholders
A parameterized query includes placeholders for the actual data to be passed in. In the simplest form, these
placeholders can often be just a question mark (e.g. SELECT name FROM customers WHERE id=?
), but most APIs also allow
for named placeholders by prefixing an identifier with a marker such as a colon or an at-sign (e.g. INSERT INTO books
(title,author) VALUES (@title,@author)
). The use of named placeholders can be beneficial when the same data is needed in
multiple parts of the query — rather than binding the data twice, you just use the same placeholder name. Named placeholders
are also easier to get right in the face of SQL statements with large numbers of parameters or if the SQL statement is changed
— it is much easier to ensure that the correct data is associated with a particular named parameter, than to ensure that it is
associated with the correctly-numbered parameter, as it is easy to lose count of parameters, or change their order when changing the
SQL statement.
Recommendations
Look up the API for parameterized queries for your database. In SQLite, it the APIs surrounding sqlite3_stmt, for MySQL it's the Prepared Statements API, and for Oracle the OCI parameterized statements API does the trick.
If your database API supports it, used named parameters, or at least explicit numbering (e.g. ?1,?2,?3
rather than
just ?,?,?
) to help avoid errors.
Posted by Anthony Williams
[/ database /] permanent link
Stumble It! | Submit to Reddit
| Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Design and Content Copyright © 2005-2025 Just Software Solutions Ltd. All rights reserved. | Privacy Policy