Just Software Solutions

Blog Archive

Delay Using a Database

Wednesday, 08 August 2007

A client of ours makes hardware that generates data files, and a few years ago I wrote a piece of software for them to help manage those data files. Initially it only had to deal with a few data files, so it indexed them on start up. Then someone tried to use it with several thousand data files, and start-up times got too slow, so I modified the indexing code to dump the current index to an XML file on shutdown, which it then loaded at startup. This has worked well, but now they're using it to handle hundreds of thousands of files, and the startup and shutdown times are again becoming significant due to the huge size of the XML file. Also, the data file access times are now getting larger due to the size of the in-memory index. We've now been hired again to address the issue, so this time I'm going to use a SQLite database for the index — no start up delay, no shutdown delay, and faster index lookup.

What lessons can be learned from this experience? Should I have gone for SQLite in the first instance? I don't think so. Using a simple in-memory map for the initial index was the simplest thing that could possibly work, and it has worked for a few years. The XML index file was a small change, and it kept the application working for longer. Now the application does need a database, but the implementation is certainly more complex than the initial in-memory map. By using the simple implementation first, the application was completed quicker — not only did this save my client money in the development, but it meant they could begin using it sooner. It also meant that now I come to add the database code, the requirements are better-known and there are already a whole suite of tests for how the index should behave. It has taken less than a day to write the database indexing code, whereas it could easily have taken several days at the beginning.

I think people are often too keen to jump straight to using a database, when they could often get by for now with something far simpler. That doesn't mean that requirements won't evolve, and that a database won't be required in the end, but this time can often be put off for years, thus saving time and money. In this instance I happened to use SQLite, which is free, but many people jump straight to Oracle, or SQL Server, which have expensive licenses and are often overkill for the problem at hand. Just think how much money you could save by putting off the purchase of that SQL Server license for a year or two.

Don't be scared into buying a full-featured enterprise level RDBMS at the beginning of your project; simple in-memory maps or data files will often suffice for a long time, and when they're no longer sufficient you'll know more about what you do need from your RDBMS. Maybe SQLite will do, or maybe it won't — in any case you've saved money.

Posted by Anthony Williams
[/ database /] permanent link
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

The C++ Performance TR is now publicly available

Wednesday, 08 August 2007

The C++ Performance TR is a Technical Report issued by the C++ Standards committee detailing various factors that affect the performance of a program written in C++.

This includes information on various strategies of implementing aspects of the language, along with their consequences for executable size and timing, as well as suggestions on how to write efficient code. It also includes information on use of C++ in embedded systems, including a discussion of putting constant data in ROM and direct access to hardware registers.

Whether you're a compiler writer, library writer or application developer, this is well worth a look. Download a copy from the ISO website today.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Implementing Synchronization Primitives for Boost on Windows Platforms

Wednesday, 11 July 2007

My article, Implementing Synchronization Primitives for Boost on Windows Platforms from the April 2007 issue of Overload is now available online.

In the article, I describe a the implementation of a new mutex type for Windows platforms, for the Boost Threads library.

Posted by Anthony Williams
[/ news /] permanent link
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Chaining Constructors in C++

Monday, 18 June 2007

Chain Constructors is one of the refactorings from Refactoring to Patterns (page 340) designed to reduce duplication — in this case duplication between constructors. Unfortunately, it is not such a straight-forward refactoring in C++, since in the current Standard it is not possible for one constructor to delegate to another.

The proposal to the C++ committee to support this feature in the next C++ Standard has been accepted, but the next Standard won't be ready until 2009, with implementations available sometime after that. If you've got a problem in your current project for which this is an appropriate refactoring then two years or more is a bit long too wait. So, with that in mind, I'm posting my work-around here for those that would like this feature now.

Adding a layer of redirection

All problems in Software can be solved by adding a layer of redirection, and this is no exception. In this case, we add a level of redirection between the class and its data by wrapping the data in a private struct. Every constructor of the original class then delegates to the constructor(s) of the internal struct. I'll illustrate with one of the examples from the Standardization proposal:

class X
{
    struct internal
    {
        internal( int, W& );
        ~internal();
        Y y_;
        Z z_;
    } self;
public:
    X();
    X( int );
    X( W& );
};

X::internal::internal( int i, W& e ):
    y_(i), z_(e)
{
    /*Common Init*/
}

X::X():
    self( 42, 3.14 )
{
    SomePostInitialization();
}

X::X( int i ):
    self( i, 3.14 )
{
    OtherPostInitialization();
}

X::X( W& w ):
    self( 53, w )
{ /* no post-init */ }

X x( 21 ); // if the construction of y_ or z_ throws, internal::~internal is invoked

Every constructor of class X has to initialize the sole data member self, the constructor of which encapsulates all the common initialization. Each delegating constructor is then free to do any additional initialization required.

Within the member functions of X, all references to member data now have to be prefixed with self., but that's not too bad a price — it makes it clear that this is member data, and is analagous to the use of this->, or the m_ prefix.

This simple solution only provides for a single layer of delegation — multiple layers of delegation would require multiple layers of nested structs, but it does provide full support at that level.

pimpls and Compilation Firewalls

Once the data has been encapsulated in a private structure, a further step worth considering is a move to the use of a pointer to the internal structure, also known as the pimpl idiom, or the use of a compilation firewall. By so doing, all that is required in the class definition is a forward declaration of the internal class, rather than a full definition. The full definition is then provided in the implementation file for the enclosing class. This eliminates any dependencies on the internal data from other classes, at the cost of forcing the data to be heap allocated. It also removes the possibility of any operations on the enclosing class being inline. For further discussion on the pimpl idiom, see Herb Sutter's Guru of the Week entry.

Refactoring steps

Here's a quick summary of the steps needed to perform this refactoring:

  1. Create a private struct named internal in the class X being refactored with an identical set of data members to class X.
  2. Create a data member in class X of type internal named self, and remove all other data members.
  3. For each constructor of X, write a constructor of internal that mirrors the member-initializer list, and replace the member initializer list of that constructor with a single initialization of self that forwards the appropriate constructor parameters.
  4. Replace every reference to a data member y of class X to a reference to self.y.
  5. Eliminate duplication.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Plan-it Earth Website goes live

Tuesday, 20 March 2007

We've just completed the website for Plan-it Earth. They offer Yurt Holidays and family Eco Camps on their traditional Cornish smallholding, which is just a few miles from us. We have been working closely with them to develop a new website from scratch, and have thoroughly enjoyed the experience. We are passionate about Cornwall, and West Penwith in particular (hence our location), and about reducing our environmental impact, so it was wonderful to work on a website with people who were similarly passionate, and where the aim is to spread this enthusiasm.

Posted by Anthony Williams
[/ news /] permanent link
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Mocks, Stubs, Fakes and Interaction-based Testing

Monday, 04 December 2006

There has been extensive discussion on the Test-Driven Development Yahoo group recently about state-based testing, interaction-based testing, and the use of Mocks and other Test Doubles. I tend to use quite a lot of test doubles in my tests, so I thought I'd explain why.

I find that using various forms of test doubles whilst doing TDD can greatly speed my development, as it allows me to focus on the class under test, and its interaction with the rest of the system.

I tend to use a test double for any part that's not directly related to the responsibility of the class under test. For example, if I'm testing something that needs to send an email, then I will use a dummy smtp client, which just logs the email for examination in the test, rather than actually sending it. If I need a class to encrypt something, I will provide a dummy encryption algorithm, and verify that the class under test encrypts the correct data, and correctly uses the encrypted output, rather than using a real encryption algorithm, which may produce output with an element of randomness, and which cannot therefore be readily used in an assertion.

Not only does this make testing easier, since it provides for greater isolation, and more focused tests, but the resultant separation of concerns is good for the overall design — rather than relying on a particular concrete implementation, the class under test now relies on an abstract interface. This reduces coupling, and increases cohesion. It also makes it easier to reuse code — lots of small classes, with well-defined responsibilities, are much more likely to be useful elsewhere.

Though I would tend to call my test implementations of these interfaces "mocks", the term "Mock" has come to mean a quite specific type of implementation, where the user sets "expectations" on which member functions will be called, with which parameters, and in what order; the Mock then verifies these expectations, and asserts if they are not met. Many "Mock Objects" are also automatically derived, commonly by using reflection and fancy "Mock Object Frameworks". My test implementations rarely do these things, and are probably better described as "Stubs", "Fakes", or even something else. I'm coming to like "dummy" as in "Crash Test Dummy" — it's not a crash that we're testing, but the "dummy" does provide information about how the class-under-test behaved, and allows the test to specify responses to stimuli from the class-under-test, and therefore exercise particular code paths in the class-under-test.

Going back to my sending-email example above — by using a dummy implementation, the tests can check the behaviour of the class under test when the email is sent successfully, and when it is not. The test can also verify that it is sent at all, and with the correct contents.

Furthermore, I generally don't write a separate class for the dummy implementations — I use the "Self Shunt" pattern, and make the test-case class server double-duty as the dummy implementation. This has the added benefit that I don't have to explicitly pull any data out of the dummy class, as it's already right there in the test — the dummy functions can just store data directly in the member variables for the tests to use.

The real question, as ever, is where to draw the line between using real code, and providing dummy implementations; the extremes are easy, it's the in-between cases that require more thought. If the code talks to an external system (remote server, database, filesystem, etc), then for TDD-style tests, it's probably best to provide a dummy implementation. Likewise, at the other extreme, you have to have a real implementation of something in order for the test to be worthwhile.

I tend to draw the line along where I think the division in responsibility lies — if the code needs to send an email in response to certain conditions, then there are two responsibilities: sending an email, and making the decision to do so based on the conditions. I would therefore have two classes, and two sets of tests. One class will actually unconditionally send an email, in which case I would provide a dummy implementation of an SMTP server under control of the tests, and have the class under test connect to it as-if it was a real SMTP server. In this case, the dummy will have to implement the full SMTP protocol, though the responses might be hard-coded, or depend on what aspect is being tested.

The second responsibility (deciding to send an email) belongs in a separate class, and the tests for this would provide a dummy implementation of the email-sending interface, so there's no network traffic required. I would (and have done where this has been required) probably develop this class first, in order to isolate precisely what interface is needed for the email-sending class, unless I already had an email-sending class that I was hoping to reuse, in which case I would start with the interface to that as it stood, and refactor if necessary.

Posted by Anthony Williams
[/ tdd /] permanent link
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

V1.2 of the dbExpress drivers for MySQL V5.0 released

Tuesday, 28 November 2006

New this release:

  • Now works with TSQLTable
  • Support for DecimalSeparator other than '.'
  • Support for the ServerCharSet connection parameter in BDS2006
  • Now correctly retrieves empty binary data fields

See the download page for more details.

Posted by Anthony Williams
[/ delphi /] permanent link
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Vector-based Bowling Scores in C++

Tuesday, 02 May 2006

Introduction

In one of his recent articles, Ron Jeffries looks at a vector-based method of calculating scores for Tenpin bowling. He uses Ruby, and makes two comments which inspired me to redo it in C++.

First, he says:

To begin with, did you notice that every method is only one line long? That's a characteristic of good Smalltalk code, and good Ruby code, in my opinion. You can't get there in Java or C# -- much less C++

I disagree — good C++ will have short functions, and many will only be one line. This is borne out by my sample code.

Ron also says:

I'm not sure what would happen if we tried to build a vector-oriented solution in Java or C#. I'm sure it would be larger, and that we'd have to put a bunch of code in "our" classes instead of in Array and Object. On the other hand, I'm confident that we could have our score method look very much like the one we have here, processing the vectors "all at once" in a very similar way.

I'm using C++ rather than Java or C#, but I hope Ron will still be interested. I did write versions of accumulate that accept a whole vector at once, rather than requiring a pair of iterators, but otherwise it's all idiomatic C++.

Discussion

As you can see from the code (below), there is only one functions which is more than one line long. This is Game::frame_starts, and the reason for this is the lack of the built-in numeric range generation one gets with Ruby's 1..9 syntax. With such a mechanism, we could make it similar to Game::frame_scores.

Given the range-based versions of accumulate I've added, we could add make_numeric_range to return a range containing the required integers, so Game::frame_starts could be made one line, but the code for this would end up being longer than Game::frame_starts is at the moment.

Of course, as Game::frame_scores demonstrates, the code has to go somewhere — in the case of Game::frame_scores, this is the nested class Game::AddFrameScore, and its operator(). If C++ had lambdas (which hopefully it will, when the next standard comes out), then we could include this code directly in the call to std::accumulate, but as it is, we need a whole new class. Member classes don't have the access to the instance members of their parent class that Java's inner classes enjoy, so we need to keep the reference to the Game object explicitly.

At 75 lines, including blank lines, the code is only marginally longer than the 72 lines for Ron's Ruby version, and just as clear, to my mind.

The code

Here it is, in all its glory. First the implementation:

#include <numeric>
#include <vector>
    
template<typename Range,typename Accum>
Accum accumulate(Range const& range,Accum initial) {
    return std::accumulate(range.begin(),range.end(),initial);
}

template<typename Range,typename Accum,typename Pred>
Accum accumulate(Range const& range,Accum initial,Pred pred) {
    return std::accumulate(range.begin(),range.end(),initial,pred);
}

class Game {
    std::vector<unsigned> rolls;

    struct AddFrameScore {
        Game const& game;
    
        explicit AddFrameScore(Game const& game_):
            game(game_)
        {}
    
        std::vector<unsigned> operator()(std::vector<unsigned> res,unsigned first_roll) {
            return res.push_back(game.frame_score(first_roll)), res;
        }
    };
public:
    static unsigned const frame_count=10;

    template<unsigned Size>
    explicit Game(unsigned const(& rolls_)[Size]):
        rolls(rolls_,rolls_+Size)
    {}

    unsigned score() const {
        return accumulate(frame_scores(),0);
    }

    unsigned is_strike(unsigned first_roll) const {
        return rolls[first_roll]==10;
    }
    
    unsigned is_spare(unsigned first_roll) const {
        return (rolls[first_roll]+rolls[first_roll+1])==10;
    }

    unsigned is_mark(unsigned first_roll) const {
        return is_strike(first_roll) || is_spare(first_roll);
    }

    unsigned rolls_to_score(unsigned first_roll) const {
        return is_mark(first_roll)?3:2;
    }

    unsigned rolls_in_frame(unsigned first_roll) const {
        return is_strike(first_roll)?1:2;
    }

    unsigned frame_score(unsigned first_roll) const {
        return std::accumulate(&rolls[first_roll],&rolls[first_roll+rolls_to_score(first_roll)],0);
    }

    std::vector<unsigned> frame_starts() const {
        std::vector<unsigned> res;
        for(unsigned i=0;res.size()<frame_count;i+=rolls_in_frame(i)) {
            res.push_back(i);
        }
        return res;
    }

    std::vector<unsigned> frame_scores() const {
        return ::accumulate(frame_starts(),std::vector<unsigned>(),AddFrameScore(*this));
    }
};

And now the tests:

#include <algorithm>
#include <iostream>

#define ASSERT_EQUALS(lhs,rhs)                                          \
    {                                                                   \
        if(lhs!=rhs) {                                                  \
            std::cerr<<__FILE__<<": "<<__LINE__                         \
                     <<": Error: Assertion failed: " #lhs "==" #rhs ", lhs=" \
                     <<lhs<<", rhs="<<rhs<<std::endl;                   \
        }                                                               \
    }

template<typename T>
std::ostream& operator<<(std::ostream& os,std::vector<T> const& vec) {
    os<<"{";
    for(typename std::vector<T>::const_iterator it=vec.begin(),
            end=vec.end();
        it!=end;
        ++it) {
        os<<*it<<",";
    }
    return os<<"}";
}

template<unsigned Size>
std::ostream& operator<<(std::ostream& os,unsigned const (& vec)[Size]) {
    os<<"{";
    for(unsigned i=0;i<Size;++i) {
        os<<vec[i]<<",";
    }
    return os<<"}";
}


template<typename LhsRange,unsigned RhsRangeSize>
bool operator!=(LhsRange const& lhs,unsigned const(& rhs)[RhsRangeSize]) {
    return (std::distance(lhs.begin(),lhs.end()) != RhsRangeSize) ||
        !std::equal(lhs.begin(),lhs.end(),rhs);
}

int main() {
    unsigned const full_game_rolls=20;
    unsigned const all_zeros[full_game_rolls]={0};
    ASSERT_EQUALS(Game(all_zeros).score(),0);
    unsigned const all_open[]={1,2,2,6,3,2,4,1,5,4,6,0,7,2,8,0,9,0,0,2};
    ASSERT_EQUALS(Game(all_open).score(),64);
    unsigned const all_open_frame_starts[]={0,2,4,6,8,10,12,14,16,18};
    ASSERT_EQUALS(Game(all_open).frame_starts(),all_open_frame_starts);
    for(unsigned i=0;i<full_game_rolls;i+=2) {
        ASSERT_EQUALS(Game(all_open).rolls_to_score(i),2);
    }
    unsigned const spare[full_game_rolls]={6,4,6,2};
    ASSERT_EQUALS(Game(spare).rolls_to_score(0),3);
    ASSERT_EQUALS(Game(spare).rolls_to_score(2),2);
    unsigned const all_open_frame_scores[Game::frame_count]={3,8,5,5,9,6,9,8,9,2};
    ASSERT_EQUALS(Game(all_open).frame_scores(),all_open_frame_scores);
    unsigned const spare_frame_scores[Game::frame_count]={16,8};
    ASSERT_EQUALS(Game(spare).frame_scores(),spare_frame_scores);
    ASSERT_EQUALS(Game(spare).score(),24);

    unsigned const strike[full_game_rolls-1]={10,6,2};
    ASSERT_EQUALS(Game(strike).rolls_to_score(0),3);
    ASSERT_EQUALS(Game(strike).rolls_to_score(1),2);
    unsigned const strike_frame_starts[]={0,1,3,5,7,9,11,13,15,17};
    ASSERT_EQUALS(Game(strike).frame_starts(),strike_frame_starts);

    unsigned const alternating[]={10,1,9,10,1,9,10,1,9,10,1,9,10,1,9,10};
    ASSERT_EQUALS(Game(alternating).score(),200);

    unsigned const all_strikes[]={10,10,10,10,10,10,10,10,10,10,10,10};
    ASSERT_EQUALS(Game(all_strikes).score(),300);
    
}

Posted by Anthony Williams
[/ cplusplus /] permanent link
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Message Handling Without Dependencies

Wednesday, 19 April 2006

My article, Message Handling Without Dependencies has been published in the May 2006 issue of Dr Dobb's Journal.

In the article, I describe a technique using templates and virtual functions to reduce dependencies when passing messages in C++. I have used this technique to great effect in production code.

Posted by Anthony Williams
[/ news /] permanent link
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Review of Refactoring to Patterns by Joshua Kerievsky

Wednesday, 12 April 2006

Refactoring To Patterns brings together the Patterns movement, and the practice of Refactoring commonplace in the Agile community. Whereas the original Gang of Four book told us what patterns were, what sort of problems they solved, and how the code might be structured, Refactoring To Patterns illustrates how, why and when to introduce patterns into an existing codebase.

The opening chapters cover the background, introducing both refactoring and design patterns, and the context in which the book was written. This gives the reader a clear overview of what is involved in Refactoring to Patterns, and paves the way for the refactoring catalogue which makes up the bulk of the book.

The catalogue is divided into chapters based on the type of change required — is this a refactoring to simplify code, generalize code, or increase encapsulation and protection? Each chapter has an introduction which gives an overview of the refactorings contained within that chapter, followed by the refactorings themselves. These introductions clearly illustrate the principles and choices which would lead one to follow the refactorings that follow.

Each refactoring starts with a brief one sentence summary, and before and after structure diagrams with reference to the structure diagrams for the relevant pattern in the Design Patterns book. The sections that follow then cover the Motivation for using this refactoring, step-by-step Mechanics, and a worked Example, relating back to the steps given for the Mechanics. Finally, some of the refactorings finish with Variations on the same theme. The examples are all pulled from a small sample of projects, which are introduced at the beginning of the catalogue section, and help illuminate the instructions given in the Mechanics section. The mechanics themselves are generally clear, and broken down into small steps — sometimes smaller steps than I might take in practice, but I think this is probably wise, as large steps can easily confuse. Finally, the Motivation sections do a good job of explaining why one would choose to do a particular refactoring, and any pitfalls to doing so — the "Benefits and Liabilities" tables provide a useful summary.

This book is well written, easy to read, and genuinely useful. It has helped me put some of the refactorings I do into a larger context, and given me insight into how I can integrate patterns with existing code, rather than designing them in up front. As John Brant and Don Roberts highlight in their Afterword, this is a book to study, the real benefit comes not from knowing the mechanics, but by understanding the motivation, and the process, so that one may apply the same thinking to other scenarios not covered by this book. If you are serious about software development, buy this book, inwardly digest it, and keep it by your side.

Highly Recommended.

Buy this book

Refactoring to Patterns
Joshua Kerievsky
Published by Addison-Wesley
ISBN 0-321-21335-1

Buy from Amazon.co.uk
Buy from Amazon.com

Posted by Anthony Williams
[/ reviews /] permanent link
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

More recent entries Older entries

Design and Content Copyright © 2005-2025 Just Software Solutions Ltd. All rights reserved. | Privacy Policy