Blog Archive

March 2009 C++ Standards Committee Mailing - New C++0x Working Paper, Concurrency Changes

Monday, 30 March 2009

The March 2009 mailing for the C++ Standards Committee was published last week. This mailing contains the results of the first round of National Body voting on the C++0x draft, as well as the latest version of the C++0x working draft. This latest draft includes some changes in response to these NB comments, as agreed at the committee meeting at the beginning of March. Some of the changes related to concurrency and the thread library are listed below. The state of all comments (accepted, rejected, or unprocessed) can be found in N2863: C++ CD1 comment status.

The committee is intending to address all the comments (which may include rejecting some, as has already happened) in time to publish a second draft for National Body comments by the end of the year. If there is sufficient consensus on that draft, it will become the C++0x standard, otherwise it will have to undergo another round of revisions.

Concurrency-related Changes

The atomics library has only seen one accepted change so far, and that's a result of US issue 91: a failed compare_exchange operation is only atomic load rather than a read-modify-write operation. This should not have any impact on code that uses atomics, but can enable the implementation to be optimized on some architectures. The details can be seen in LWG issue 1043.

On the other hand, the thread library has seen a couple of accepted changes which will have user-visible consequences. These are:

std::thread destructor calls std::terminate() instead of detach(): Hans Boehm's paper N2082: A plea to reconsider detach-on-destruction for thread objects, was reviewed as part of US issue 97. The result is that if you do not explicitly call join() or detach() on your std::thread objects before they are destroyed then the library will call std::terminate(). This is to ensure that there are no unintentional "dangling threads" with references to local variables.
std::thread and std::unique_lock no longer have swap() functions that operate on rvalues: This change is in response to US issue 46, and the associated paper N2844: Fixing a Safety Problem with Rvalue References: Proposed Wording (Revision 1), which changes the way the rvalue-references work. In particular, an rvalue-reference no longer binds to an lvalue. Combined with the previous change to disallow destroying std::thread objects with an associated thread of execution this makes perfect sense: swapping two rvalue std::thread objects serves no purpose anyway, and swapping a std::thread variable with an rvalue would now call std::terminate() when the rvalue is destroyed at the end of the expression, if the variable had an associated thread of execution.
The single-argument std::thread constructor has been removed: This was UK issue 323. The variadic std::thread constructor provides all the necessary functionality.

There are also a few minor concurrency-related changes that have been approved, mostly along the lines of clarifying the text. There are a few more which are still under discussion, one of which is quite significant: UK issue 329. This comment proposes the addition of a new function std::async() which will execute a function asynchronously and return a std::unique_future which can be used to obtain the result. Details can be seen under LWG issue 1043.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: C++0x, C++, standards, concurrency
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

The Software Craftsmanship Manifesto

Wednesday, 11 March 2009

Do you care about the quality of your work as a software developer? Do you strive to produce the best software you can for your clients or employer? I don't mean basic level "does it work?" kind of quality — I hope we all aim to produce code that works. Does it matter to you if the code is well-crafted? Do you strive to write elegant software? Do you actively work to improve your skills as a developer?

There's been a lot of discussion about software quality on the internet recently. Uncle Bob, Joel Spolsky and Jeff Atwood got involved in the "Quality doesn't matter" debate, culminating in Uncle Bob talking on Jeff and Joel's Stack Overflow Podcast. James Bach even went as far as to hypothesise that Quality is Dead.

James has a point: in many instances it seems that people are quite happy to tolerate buggy software that's "good enough", and that developers are quite happy to ship such software. We're not perfect, and we will write code with bugs in, but to a large extent it's the attitude that counts. Whilst I accept that there may well be bugs in my code, I strive to avoid them, work hard to fix any that are found, and try and learn ways of reducing their occurrence in future. I also feel that software should be well-crafted so that it doesn't just work now, but will continue to work as it evolves, and such evolution should be as easy as possible. Of course, there's more to software quality than that — quality is Value to Some Person, and your job as a software developer is to ensure that your clients, customers or employers get the things that they value from the software you develop.

If this is something you feel strongly about, rest assured that you're not alone — there are many others who feel that Quality is Alive, to the extent that a few developers have got together to draft a Manifesto for Software Craftsmanship. The manifesto has over 1500 signatures (including mine) — why not add yours?

Posted by Anthony Williams
[/ design /] permanent link
Tags: software, craftsmanship, design, manifesto
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Multithreading in C++0x part 3: Starting Threads with Member Functions and Reference Arguments

Thursday, 26 February 2009

This is the third of a series of blog posts introducing the new C++0x thread library. The first two parts covered Starting Threads in C++0x with simple functions, and starting threads with function objects and additional arguments.

If you've read the previous parts of the series then you've seen how to start threads with functions and function objects, with and without additional arguments. However, the function objects and arguments are always copied into the thread's internal storage. What if you wish to run a member function other than the function call operator, or pass a reference to an existing object?

The C++0x library can handle both these cases: the use of member functions with std::thread requires an additional argument for the object on which to invoke the member function, and references are handled with std::ref. Let's take a look at some examples.

Invoking a member function on a new thread

Starting a new thread which runs a member function of an existing object: you just pass a pointer to the member function and a value to use as the this pointer for the object in to the std::thread constructor.

#include <thread>
#include <iostream>

class SayHello
{
public:
    void greeting() const
    {
        std::cout<<"hello"<<std::endl;
    }
};

int main()
{
    SayHello x;
    std::thread t(&SayHello::greeting,&x);
    t.join();
}

You can of course pass additional arguments to the member function too:

#include <thread>
#include <iostream>

class SayHello
{
public:
    void greeting(std::string const& message) const
    {
        std::cout<<message<<std::endl;
    }
};

int main()
{
    SayHello x;
    std::thread t(&SayHello::greeting,&x,"goodbye");
    t.join();
}

Now, the preceding examples both a plain pointer to a local object for the this argument; if you're going to do that, you need to ensure that the object outlives the thread, otherwise there will be trouble. An alternative is to use a heap-allocated object and a reference-counted pointer such as std::shared_ptr<SayHello> to ensure that the object stays around as long as the thread does:

#include <>

int main()
{
    std::shared_ptr<SayHello> p(new SayHello);
    std::thread t(&SayHello::greeting,p,"goodbye");
    t.join();
}

So far, everything we've looked at has involved copying the arguments and thread functions into the internal storage of a thread even if those arguments are pointers, as in the this pointers for the member functions. What if you want to pass in a reference to an existing object, and a pointer just won't do? That is the task of std::ref.

Passing function objects and arguments to a thread by reference

Suppose you have an object that implements the function call operator, and you wish to invoke it on a new thread. The thing is you want to invoke the function call operator on this particular object rather than copying it. You could use the member function support to call operator() explicitly, but that seems a bit of a mess given that it is callable already. This is the first instance in which std::ref can help — if x is a callable object, then std::ref(x) is too, so we can pass std::ref(x) as our function when we start the thread, as below:

#include <thread>
#include <iostream>
#include <functional> // for std::ref

class PrintThis
{
public:
    void operator()() const
    {
        std::cout<<"this="<<this<<std::endl;
    }
};

int main()
{
    PrintThis x;
    x();
    std::thread t(std::ref(x));
    t.join();
    std::thread t2(x);
    t2.join();
}

In this case, the function call operator just prints the address of the object. The exact form and values of the output will vary, but the principle is the same: this little program should output three lines. The first two should be the same, whilst the third is different, as it invokes the function call operator on a copy of x. For one run on my system it printed the following:

this=0x7fffb08bf7ef
this=0x7fffb08bf7ef
this=0x42674098

Of course, std::ref can be used for other arguments too — the following code will print "x=43":

#include <thread>
#include <iostream>
#include <functional>

void increment(int& i)
{
    ++i;
}

int main()
{
    int x=42;
    std::thread t(increment,std::ref(x));
    t.join();
    std::cout<<"x="<<x<<std::endl;
}

When passing in references like this (or pointers for that matter), you need to be careful not only that the referenced object outlives the thread, but also that appropriate synchronization is used. In this case it is fine, because we only access x before we start the thread and after it is done, but concurrent access would need protection with a mutex.

Next time

That wraps up all the variations on starting threads; next time we'll look at using mutexes to protect data from concurrent modification.

Subscribe to the RSS feed or email newsletter for this blog to be sure you don't miss the rest of the series.

Try it out

If you're using Microsoft Visual Studio 2008 or g++ 4.3 or 4.4 on Ubuntu Linux you can try out the examples from this series using our just::thread implementation of the new C++0x thread library. Get your copy today.

Multithreading in C++0x Series

Here are the posts in this series so far:

Posted by Anthony Williams
[/ threading /] permanent link
Tags: concurrency, multithreading, C++0x, thread
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Multithreading in C++0x part 2: Starting Threads with Function Objects and Arguments

Tuesday, 17 February 2009

This is the second of a series of blog posts introducing the new C++0x thread library. If you missed the first part, it covered Starting Threads in C++0x with simple functions.

If you read part 1 of this series, then you've seen how easy it is to start a thread in C++0x: just construct an instance of std::thread, passing in the function you wish to run on the new thread. Though this is good, it would be quite limiting if new threads were constrained to run plain functions without any arguments — all the information needed would have to be passed via global variables, which would be incredibly messy. Thankfully, this is not the case. Not only can you run function objects on your new thread, as well as plain functions, but you can pass arguments in too.

Running a function object on another thread

In keeping with the rest of the C++ standard library, you're not limited to plain functions when starting threads — the std::thread constructor can also be called with instances of classes that implement the function-call operator. Let's say "hello" from our new thread using a function object:

#include <thread>
#include <iostream>

class SayHello
{
public:
    void operator()() const
    {
        std::cout<<"hello"<<std::endl;
    }
};

int main()
{
    std::thread t((SayHello()));
    t.join();
}

If you're wondering about the extra parentheses around the SayHello constructor call, this is to avoid what's known as C++'s most vexing parse: without the parentheses, the declaration is taken to be a declaration of a function called t which takes a pointer-to-a-function-with-no-parameters-returning-an-instance-of-SayHello, and which returns a std::thread object, rather than an object called t of type std::thread. There are a few other ways to avoid the problem. Firstly, you could create a named variable of type SayHello and pass that to the std::thread constructor:

int main()
{
    SayHello hello;
    std::thread t(hello);
    t.join();
}

Alternatively, you could use copy initialization:

int main()
{
    std::thread t=std::thread(SayHello());
    t.join();
}

And finally, if you're using a full C++0x compiler then you can use the new initialization syntax with braces instead of parentheses:

int main()
{
    std::thread t{SayHello()};
    t.join();
}

In this case, this is exactly equivalent to our first example with the double parentheses.

Anyway, enough about initialization. Whichever option you use, the idea is the same: your function object is copied into internal storage accessible to the new thread, and the new thread invokes your operator(). Your class can of course have data members and other member functions too, and this is one way of passing data to the thread function: pass it in as a constructor argument and store it as a data member:

#include <thread>
#include <iostream>
#include <string>

class Greeting
{
    std::string message;
public:
    explicit Greeting(std::string const& message_):
        message(message_)
    {}
    void operator()() const
    {
        std::cout<<message<<std::endl;
    }
};

int main()
{
    std::thread t(Greeting("goodbye"));
    t.join();
}

In this example, our message is stored as a data member in the class, so when the Greeting instance is copied into the thread the message is copied too, and this example will print "goodbye" rather than "hello".

This example also demonstrates one way of passing information in to the new thread aside from the function to call — include it as data members of the function object. If this makes sense in terms of the function object then it's ideal, otherwise we need an alternate technique.

Passing Arguments to a Thread Function

As we've just seen, one way to pass arguments in to the thread function is to package them in a class with a function call operator. Well, there's no need to write a special class every time; the standard library provides an easy way to do this in the form of std::bind. The std::bind function template takes a variable number of parameters. The first is always the function or callable object which needs the parameters, and the remainder are the parameters to pass when calling the function. The result is a function object that stores copies of the supplied arguments, with a function call operator that invokes the bound function. We could therefore use this to pass the message to write to our new thread:

#include <thread>
#include <iostream>
#include <string>
#include <functional>

void greeting(std::string const& message)
{
    std::cout<<message<<std::endl;
}

int main()
{
    std::thread t(std::bind(greeting,"hi!"));
    t.join();
}

This works well, but we can actually do better than that — we can pass the arguments directly to the std::thread constructor and they will be copied into the internal storage for the new thread and supplied to the thread function. We can thus write the preceding example more simply as:

#include <thread>
#include <iostream>
#include <string>

void greeting(std::string const& message)
{
    std::cout<<message<<std::endl;
}

int main()
{
    std::thread t(greeting,"hi!");
    t.join();
}

Not only is this code simpler, it's also likely to be more efficient as the supplied arguments can be copied directly into the internal storage for the thread rather than first into the object generated by std::bind, which is then in turn copied into the internal storage for the thread.

Multiple arguments can be supplied just by passing further arguments to the std::thread constructor:

#include <thread>
#include <iostream>

void write_sum(int x,int y)
{
    std::cout<<x<<" + "<<y<<" = "<<(x+y)<<std::endl;
}

int main()
{
    std::thread t(write_sum,123,456);
    t.join();
}

The std::thread constructor is a variadic template, so it can take any number of arguments up to the compiler's internal limit, but if you need to pass more than a couple of parameters to your thread function then you might like to rethink your design.

Next time

We're not done with starting threads just yet — there's a few more nuances to passing arguments which we haven't covered. In the third part of this series we'll look at passing references, and using class member functions as the thread function.

Subscribe to the RSS feed or email newsletter for this blog to be sure you don't miss the rest of the series.

Try it out

Multithreading in C++0x Series

Here are the posts in this series so far:

Posted by Anthony Williams
[/ threading /] permanent link
Tags: concurrency, multithreading, C++0x, thread
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Multithreading in C++0x part 1: Starting Threads

Tuesday, 10 February 2009

This is the first of a series of blog posts introducing the new C++0x thread library.

Concurrency and multithreading is all about running multiple pieces of code in parallel. If you have the hardware for it in the form of a nice shiny multi-core CPU or a multi-processor system then this code can run truly in parallel, otherwise it is interleaved by the operating system — a bit of one task, then a bit of another. This is all very well, but somehow you have to specify what code to run on all these threads.

High level constructs such as the parallel algorithms in Intel's Threading Building Blocks manage the division of code between threads for you, but we don't have any of these in C++0x. Instead, we have to manage the threads ourselves. The tool for this is std::thread.

Running a simple function on another thread

Let's start by running a simple function on another thread, which we do by constructing a new std::thread object, and passing in the function to the constructor. std::thread lives in the <thread> header, so we'd better include that first.

#include <thread>

void my_thread_func()
{}

int main()
{
    std::thread t(my_thread_func);
}

If you compile and run this little app, it won't do a lot: though it starts a new thread, the thread function is empty. Let's make it do something, such as print "hello":

#include <thread>
#include <iostream>

void my_thread_func()
{
    std::cout<<"hello"<<std::endl;
}

int main()
{
    std::thread t(my_thread_func);
}

If you compile and run this little application, what happens? Does it print hello like we wanted? Well, actually there's no telling. It might do or it might not. I ran this simple application several times on my machine, and the output was unreliable: sometimes it output "hello", with a newline; sometimes it output "hello" without a newline, and sometimes it didn't output anything. What's up with that? Surely a simple app like this ought to behave predictably?

Waiting for threads to finish

Well, actually, no, this app does not have predictable behaviour. The problem is we're not waiting for our thread to finish. When the execution reaches the end of main() the program is terminated, whatever the other threads are doing. Since thread scheduling is unpredictable, we cannot know how far the other thread has got. It might have finished, it might have output the "hello", but not processed the std::endl yet, or it might not have even started. In any case it will be abruptly stopped as the application exits.

If we want to reliably print our message, we have to ensure that our thread has finished. We do that by joining with the thread by calling the join() member function of our thread object:

#include <thread>
#include <iostream>

void my_thread_func()
{
    std::cout<<"hello"<<std::endl;
}

int main()
{
    std::thread t(my_thread_func);
    t.join();
}

Now, main() will wait for the thread to finish before exiting, and the code will output "hello" followed by a newline every time. This highlights a general point: if you want a thread to have finished by a certain point in your code you have to wait for it. As well as ensuring that threads have finished by the time the program exits, this is also important if a thread has access to local variables: we want the thread to have finished before the local variables go out of scope.

Next Time

In this article we've looked at running simple functions on a separate thread, and waiting for the thread to finish. However, when you start a thread you aren't just limited to simple functions with no arguments: in the second part of this series we will look at how to start a thread with function objects, and how to pass arguments to the thread.

Subscribe to the RSS feed or email newsletter for this blog to be sure you don't miss the rest of the series.

Try it out

Multithreading in C++0x Series

Here are the posts in this series so far:

Posted by Anthony Williams
[/ threading /] permanent link
Tags: concurrency, multithreading, C++0x, thread
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Designing Multithreaded Applications with C++0x: ACCU 2009

Tuesday, 13 January 2009

The schedule for ACCU 2009 in Oxford was announced earlier today, and I am pleased to say that I will be speaking on "Designing Multithreaded Applications with C++0x" on Thursday 23rd April 2009.

As has become customary, the main conference will run from Wednesday to Saturday, with a day of pre-conference workshops on Tuesday 21st April 2009. There is a whole host of well-known speakers, including "Uncle Bob" Martin, Linda Rising, Michael Feathers and Andrei Alexandrescu, so the conference should be excellent value, as ever.

If you book before the end of February, you can take advantage of the "Early Bird" rates.

I hope to see you there!

Posted by Anthony Williams
[/ news /] permanent link
Tags: concurrency, threading, accu, C++0x
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

just::thread C++0x Thread Library V1.0 Released

Thursday, 08 January 2009

I am pleased to announce that version 1.0 of just::thread, our C++0x Thread Library is now available.

The just::thread library is a complete implementation of the new C++0x thread library as per the current C++0x working paper. Features include:

std::thread for launching threads.
Mutexes and condition variables.
std::promise, std::packaged_task, std::unique_future and std::shared_future for transferring data between threads.
Support for the new std::chrono time interface for sleeping and timeouts on locks and waits.
Atomic operations with std::atomic.
Support for std::exception_ptr for transferring exceptions between threads.
Special deadlock-detection mode for tracking down the call-stack leading to deadlocks, the bane of multithreaded programming.

The library works with Microsoft Visual Studio 2008 or Microsoft Visual C++ 2008 Express for 32-bit Windows. Don't wait for a full C++0x compiler: Buy your copy of just::thread now and start using the C++0x thread library in minutes.

Posted by Anthony Williams
[/ news /] permanent link
Tags: multithreading, concurrency, C++0x
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

The Most Popular Articles of 2008

Monday, 05 January 2009

Five days into 2009, here's a list of the 10 most popular articles on the Just Software Solutions website for 2008. There's a few entries still there from last year (in particular, last year's most top entry on CSS menus is now number 2), but mostly it's new content. In 2008 I focused much more on C++0x and concurrency, and the list of popular articles reflects that.

Implementing a Thread-Safe Queue using Condition Variables
A description of the issues around writing a thread-safe queue, with code.
Implementing drop-down menus in pure CSS (no JavaScript)
How to implement drop-down menus in CSS in a cross-browser fashion (with a teensy bit of JavaScript for IE).
10 Years of Programming with POSIX Threads
A review of "Programming with POSIX Threads" by David Butenhof, 10 years after publication.
Thread Interruption in the Boost Thread Library
A description of the thread interruption feature of the Boost Thread library.
Introduction to C++ Templates (PDF)
How to use and write C++ templates.
Memory Models and Synchronization
A brief description of the relaxed memory orderings of the C++0x memory model
Deadlock Detection with just::thread
How to use the just::thread C++0x thread library to detect the origin of deadlocks in your code.
Rvalue References and Perfect Forwarding in C++0x
An introduction to the new rvalue reference feature of C++0x.
October 2008 C++ Standards Committee Mailing - New C++0x Working Paper, More Concurrency Papers Approved
My summary of the October 2008 C++ committee mailing featuring the first feature-complete draft of the C++0x standard.
Condition Variable Spurious Wakes
An introduction to the consequences of the so-called "spurious wakes" that you can get with condition variables, and how to handle them.

Posted by Anthony Williams
[/ news /] permanent link
Tags: popular, articles
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Managing Threads with a Vector

Wednesday, 10 December 2008

One of the nice things about C++0x is the support for move semantics that comes from the new Rvalue Reference language feature. Since this is a language feature, it means that we can easily have types that are movable but not copyable without resorting to std::auto_ptr-like hackery. One such type is the new std::thread class. A thread of execution can only be associated with one std::thread object at a time, so std::thread is not copyable, but it is movable — this allows you to transfer ownership of a thread of execution between objects, and return std::thread objects from functions. The important point for today's blog post is that it allows you to store std::thread objects in containers.

Move-aware containers

The C++0x standard containers are required to be move-aware, and move objects rather than copy them when changing their position within the container. For existing copyable types that don't have a specific move constructor or move-assignment operator that takes an rvalue reference this amounts to the same thing — when a std::vector is resized, or an element is inserted in the middle, the elements will be copied to their new locations. The important difference is that you can now store types that only have a move-constructor and move-assignment operator in the standard containers because the objects are moved rather than copied.

This means that you can now write code like:

std::vector<std::thread> v;

v.push_back(std::thread(some_function));

and it all "just works". This is good news for managing multiple threads where the number of threads is not known until run-time — if you're tuning the number of threads to the number of processors, using std::thread::hardware_concurrency() for example. It also means that you can then use the std::vector<std::thread> with the standard library algorithms such as std::for_each:

void do_join(std::thread& t)
{
    t.join();
}

void join_all(std::vector<std::thread>& v)
{
    std::for_each(v.begin(),v.end(),do_join);
}

If you need an extra thread because one of your threads is blocked waiting for something, you can just use insert() or push_back() to add a new thread to the vector. Of course you can also just move threads into or out of the vector by indexing the elements directly:

std::vector<std::thread> v(std::thread::hardware_concurrency());

for(unsigned i=0;i<v.size();++i)
{
    v[i]=std::thread(do_work);
}

In fact, many of the examples in my book use std::vector<std::thread> for managing the threads, as it's the simplest way to do it.

Other containers work too

It's not just std::vector that's required to be move-aware — all the other standard containers are too. This means you can have a std::list<std::thread>, or a std::deque<std::thread>, or even a std::map<int,std::thread>. In fact, the whole C++0x standard library is designed to work with move-only types such as std::thread.

Try it out today

Wouldn't it be nice if you could try it out today, and get used to using containers of std::thread objects without having to wait for a C++0x compiler? Well, you can — the 0.6 beta of the just::thread C++0x Thread Library released last Friday provides a specialization of std::vector<std::thread> so that you can write code like in these examples and it will work with Microsoft Visual Studio 2008. Sign up at the just::thread Support Forum to download it today.

Posted by Anthony Williams
[/ threading /] permanent link
Tags: threads, C++0x, vector, multithreading, concurrency
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Peterson's lock with C++0x atomics

Friday, 05 December 2008

Bartosz Milewski shows an implementation of Peterson's locking algorithm in his latest post on C++ atomics and memory ordering. Dmitriy V'jukov posted an alternative implementation in the comments. Also in the comments, Bartosz says:

"So even though I don't have a formal proof, I believe my implementation of Peterson lock is correct. For all I know, Dmitriy's implementation might also be correct, but it's much harder to prove."

I'd like to offer an analysis of both algorithms to see if they are correct, below. However, before we start I'd also like to highlight a comment that Bartosz made in his conclusion:

"Any time you deviate from sequential consistency, you increase the complexity of the problem by orders of magnitude."

This is something I wholeheartedly agree with. If you weren't convinced by my previous post on Memory Models and Synchronization, maybe the proof below will convince you to stick to memory_order_seq_cst (the default) unless you really need to do otherwise.

C++0x memory ordering recap

In C++0x, we have to think about things in terms of the happens-before and synchronizes-with relationships described in the Standard — it's no good saying "it works on my CPU" because different CPUs have different default ordering constraints on basic operations such as load and store. In brief, those relationships are:

Synchronizes-with

An operation A synchronizes-with an operation B if A is a store to some atomic variable m, with an ordering of std::memory_order_release, or std::memory_order_seq_cst, B is a load from the same variable m, with an ordering of std::memory_order_acquire or std::memory_order_seq_cst, and B reads the value stored by A.

Happens-before

An operation A happens-before an operation B if:

A is performed on the same thread as B, and A is before B in program order, or
A synchronizes-with B, or
A happens-before some other operation C, and C happens-before B.

There's a few more nuances to do with std::memory_order_consume, but this is enough for now.

If all your operations use std::memory_order_seq_cst, then there is the additional constraint of total ordering, as I mentioned before, but neither of the implementations in question use any std::memory_order_seq_cst operations, so we can leave that aside for now.

Now, let's look at the implementations.

Bartosz's implementation

I've extracted the code for Bartosz's implementation from his posts, and it is shown below:

class Peterson_Bartosz
{
private:
    // indexed by thread ID, 0 or 1
    std::atomic<bool> _interested[2];
    // who's yielding priority?
    std::atomic<int> _victim;
public:
    Peterson_Bartosz()
    {
       _victim.store(0, std::memory_order_release);
       _interested[0].store(false, std::memory_order_release);
       _interested[1].store(false, std::memory_order_release);
    }
    void lock()
    {
       int me = threadID; // either 0 or 1
       int he = 1 ? me; // the other thread
       _interested[me].exchange(true, std::memory_order_acq_rel);
       _victim.store(me, std::memory_order_release);
       while (_interested[he].load(std::memory_order_acquire)
           && _victim.load(std::memory_order_acquire) == me)
          continue; // spin
    }
    void unlock()
    {
        int me = threadID;
        _interested[me].store(false,std::memory_order_release);
    }
}

There are three things to prove with Peterson's lock:

If thread 0 successfully acquires the lock, then thread 1 will not do so;
If thread 0 acquires the lock and then releases it, then thread 1 will successfully acquire the lock;
If thread 0 fails to acquire the lock, then thread 1 does so.

Let's look at each in turn.

If thread 0 successfully acquires the lock, then thread 1 will not do so

Initially _victim is 0, and the _interested variables are both false. The call to lock() from thread 0 will then set _interested[0] to true, and _victim to 0.

The loop then checks _interested[1], which is still false, so we break out of the loop, and the lock is acquired.

So, what about thread 1? Thread 1 now comes along and tries to acquire the lock. It sets _interested[1] to true, and _victim to 1, and then enters the while loop. This is where the fun begins.

The first thing we check is _interested[0]. Now, we know this was set to true in thread 0 as it acquired the lock, but the important thing is: does the CPU running thread 1 know that? Is it guaranteed by the memory model?

For it to be guaranteed by the memory model, we have to prove that the store to _interested[0] from thread 0 happens-before the load from thread 1. This is trivially true if we read true in thread 1, but that doesn't help: we need to prove that we can't read false. We therefore need to find a variable which was stored by thread 0, and loaded by thread 1, and our search comes up empty: _interested[1] is loaded by thread 1 as part of the exchange call, but it is not written by thread 0, and _victim is written by thread 1 without reading the value stored by thread 0. Consequently, there is no ordering guarantee on the read of _interested[0], and thread 1 may also break out of the while loop and acquire the lock.

This implementation is thus broken. Let's now look at Dmitriy's implementation.

Dmitriy's implementation

Dmitriy posted his implementation in the comments using the syntax for his Relacy Race Detector tool, but it's trivially convertible to C++0x syntax. Here is the C++0x version of his code:

std::atomic<int> flag0(0),flag1(0),turn(0);

void lock(unsigned index)
{
    if (0 == index)
    {
        flag0.store(1, std::memory_order_relaxed);
        turn.exchange(1, std::memory_order_acq_rel);

        while (flag1.load(std::memory_order_acquire)
            && 1 == turn.load(std::memory_order_relaxed))
            std::this_thread::yield();
    }
    else
    {
        flag1.store(1, std::memory_order_relaxed);
        turn.exchange(0, std::memory_order_acq_rel);

        while (flag0.load(std::memory_order_acquire)
            && 0 == turn.load(std::memory_order_relaxed))
            std::this_thread::yield();
    }
}

void unlock(unsigned index)
{
    if (0 == index)
    {
        flag0.store(0, std::memory_order_release);
    }
    else
    {
        flag1.store(0, std::memory_order_release);
    }
}

So, how does this code fare?

If thread 0 successfully acquires the lock, then thread 1 will not do so

Initially the turn, flag0 and flag1 variables are all 0. The call to lock() from thread 0 will then set flag0 to 1, and turn to 1. These variables are essentially equivalent to the variables in Bartosz's implementation, but turn is set to 0 when _victim is set to 1, and vice-versa. That doesn't affect the logic of the code.

The loop then checks flag1, which is still 0, so we break out of the loop, and the lock is acquired.

So, what about thread 1? Thread 1 now comes along and tries to acquire the lock. It sets flag1 to 1, and turn to 0, and then enters the while loop. This is where the fun begins.

As before, the first thing we check is flag0. Now, we know this was set to 1 in thread 0 as it acquired the lock, but the important thing is: does the CPU running thread 1 know that? Is it guaranteed by the memory model?

Again, for it to be guaranteed by the memory model, we have to prove that the store to flag0 from thread 0 happens-before the load from thread 1. This is trivially true if we read 1 in thread 1, but that doesn't help: we need to prove that we can't read 0. We therefore need to find a variable which was stored by thread 0, and loaded by thread 1, as before.

This time our search is successful: turn is set using an exchange operation, which is a read-modify-write operation. Since it uses std::memory_order_acq_rel memory ordering, it is both a load-acquire and a store-release. If the load part of the exchange reads the value written by thread 0, we're home dry: turn is stored with a similar exchange operation with std::memory_order_acq_rel in thread 0, so the store from thread 0 synchronizes-with the load from thread 1.

This means that the store to flag0 from thread 0 happens-before the exchange on turn in thread 1, and thus happens-before the load in the while loop. The load in the while loop thus reads 1 from flag0, and proceeds to check turn.

Now, since the store to turn from thread 0 happens-before the store from thread 1 (we're relying on that for the happens-before relationship on flag0, remember), we know that the value to be read will be the value we stored in thread 1: 0. Consequently, we keep looping.

OK, so if the store to turn in thread 1 reads the value stored by thread 0 then thread 1 will stay out of the lock, but what if it doesn't read the value store by thread 0? In this case, we know that the exchange call from thread 0 must have seen the value written by the exchange in thread 1 (writes to a single atomic variable always become visible in the same order for all threads), which means that the write to flag1 from thread 1 happens-before the read in thread 0 and so thread 0 cannot have acquired the lock. Since this was our initial assumption (thread 0 has acquired the lock), we're home dry — thread 1 can only acquire the lock if thread 0 didn't.

If thread 0 acquires the lock and then releases it, then thread 1 will successfully acquire the lock

OK, so we've got as far as thread 0 acquiring the lock and thread 1 waiting. What happens if thread 0 now releases the lock? It does this simply by writing 0 to flag0. The while loop in thread 1 checks flag0 every time round, and breaks out if the value read is 0. Therefore, thread 1 will eventually acquire the mutex. Of course, there is no guarantee when it will acquire the mutex — it might take arbitrarily long for the the write to flag0 to make its way to thread 1, but it will get there in the end. Since flag0 is never written by thread 1, it doesn't matter whether thread 0 has already released the lock when thread 1 starts waiting, or whether thread 1 is already waiting — the while loop will still terminate, and thread 1 will acquire the lock in both cases.

That just leaves our final check.

If thread 0 fails to acquire the lock, then thread 1 does so

We've essentially already covered this when we checked that thread 1 doesn't acquire the lock if thread 0 does, but this time we're going in reverse. If thread 0 doesn't acquire the lock, it is because it sees flag1 as 1 and turn as 1. Since flag1 is only written by thread 1, if it is 1 then thread 1 must have at least called lock(). If thread 1 has called unlock then eventually flag1 will be read as 0, so thread 0 will acquire the lock. So, let's assume for now that thread 1 hasn't got that far, so flag1 is still 1. The next check is for turn to be 1. This is the value written by thread 0. If we read it as 1 then either the write to turn from thread 1 has not yet become visible to thread 0, or the write happens-before the write by thread 0, so the write from thread 0 overwrote the old value.

If the write from thread 1 happens-before the write from thread 0 then thread 1 will eventually see turn as 1 (since the last write is by thread 0), and thus thread 1 will acquire the lock. On the other hand, if the write to turn from thread 0 happens-before the write to turn from thread 1, then thread 0 will eventually see the turn as 0 and acquire the lock. Therefore, for thread 0 to be stuck waiting the last write to turn must have been by thread 0, which implies thread 1 will eventually get the lock.

Therefore, Dmitriy's implementation works.

Differences, and conclusion

The key difference between the implementations other than the naming of the variables is which variable the exchange operation is applied to. In Bartosz's implementation, the exchange is applied to _interested[me], which is only ever written by one thread for a given value of me. In Dmitriy's implementation, the exchange is applied to the turn variable, which is the variable updated by both threads. It therefore acts as a synchronization point for the threads. This is the key to the whole algorithm — even though many of the operations in Dmitriy's implementation use std::memory_order_relaxed, whereas Bartosz's implementation uses std::memory_order_acquire and std::memory_order_release everywhere, the single std::memory_order_acq_rel on the exchange on the right variable is enough.

I'd like to finish by repeating Bartosz's statement about relaxed memory orderings:

"Any time you deviate from sequential consistency, you increase the complexity of the problem by orders of magnitude."

Posted by Anthony Williams
[/ threading /] permanent link
Tags: concurrency, C++0x, atomics, memory model, synchronization
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

More recent entries Older entries

About Us

Technical Writings

Subscribe to Blog

Blog Archives

Blog Archive

Monday, 30 March 2009

Concurrency-related Changes

Wednesday, 11 March 2009

Thursday, 26 February 2009

Invoking a member function on a new thread

Passing function objects and arguments to a thread by reference

Next time

Try it out

Tuesday, 17 February 2009

Running a function object on another thread

Passing Arguments to a Thread Function

Next time

Try it out

Tuesday, 10 February 2009

Running a simple function on another thread

Waiting for threads to finish

Next Time

Try it out

Tuesday, 13 January 2009

Thursday, 08 January 2009

Monday, 05 January 2009

Wednesday, 10 December 2008

Move-aware containers

Other containers work too

Try it out today

Friday, 05 December 2008

C++0x memory ordering recap

Bartosz's implementation

If thread 0 successfully acquires the lock, then thread 1 will not do so

Dmitriy's implementation

If thread 0 successfully acquires the lock, then thread 1 will not do so

If thread 0 acquires the lock and then releases it, then thread 1 will successfully acquire the lock

If thread 0 fails to acquire the lock, then thread 1 does so

Differences, and conclusion