Blog Archive for / 2010 / 07 /
Implementing Dekker's algorithm with Fences
Tuesday, 27 July 2010
Dekker's algorithm is one of the most basic algorithms for mutual exclusion, alongside Peterson's algorithm and Lamport's bakery algorithm. It has the nice property that it only requires load and store operations rather than exchange or test-and-set, but it still requires some level of ordering between the operations. On a weakly-ordered architecture such as PowerPC or SPARC, a correct implementation of Dekker's algorithm thus requires the use of fences or memory barriers in order to ensure correct operation.
The code
For those of you who just want the code: here it is — Dekker's algorithm in C++, with explicit fences.
std::atomic<bool> flag0(false),flag1(false); std::atomic<int> turn(0); void p0() { flag0.store(true,std::memory_order_relaxed); std::atomic_thread_fence(std::memory_order_seq_cst); while (flag1.load(std::memory_order_relaxed)) { if (turn.load(std::memory_order_relaxed) != 0) { flag0.store(false,std::memory_order_relaxed); while (turn.load(std::memory_order_relaxed) != 0) { } flag0.store(true,std::memory_order_relaxed); std::atomic_thread_fence(std::memory_order_seq_cst); } } std::atomic_thread_fence(std::memory_order_acquire); // critical section turn.store(1,std::memory_order_relaxed); std::atomic_thread_fence(std::memory_order_release); flag0.store(false,std::memory_order_relaxed); } void p1() { flag1.store(true,std::memory_order_relaxed); std::atomic_thread_fence(std::memory_order_seq_cst); while (flag0.load(std::memory_order_relaxed)) { if (turn.load(std::memory_order_relaxed) != 1) { flag1.store(false,std::memory_order_relaxed); while (turn.load(std::memory_order_relaxed) != 1) { } flag1.store(true,std::memory_order_relaxed); std::atomic_thread_fence(std::memory_order_seq_cst); } } std::atomic_thread_fence(std::memory_order_acquire); // critical section turn.store(0,std::memory_order_relaxed); std::atomic_thread_fence(std::memory_order_release); flag1.store(false,std::memory_order_relaxed); }
The analysis
If you're like me then you'll be interested in why stuff works, rather than just taking the code. Here is my analysis of the required orderings, and how the fences guarantee those orderings.
Suppose thread 0 and thread 1 enter p0
and p1
respectively at the same time. They both set their
respective flags to true
, execute the fence and then read
the other flag at the start of the while
loop.
If both threads read false
then both will enter the
critical section, and the algorithm doesn't work. It is the job of the
fences to ensure that this doesn't happen.
The fences are marked with memory_order_seq_cst
, so either the
fence in p0
is before the fence in p1
in the global ordering of
memory_order_seq_cst
operations, or vice-versa. Without
loss of generality, we can assume that the fence in p0
comes before the fence in p1
, since the code is
symmetric. The store to flag0
is sequenced before the
fence in p0
, and the fence in p1
is
sequenced before the read from flag0
. Therefore the
read from flag0
must see the value stored
(true
), so p1
will enter
the while
loop.
On the other side, there is no such guarantee for the read
from flag1
in p0
, so p0
may or
may not enter the while
loop. If p0
reads
the value of false
for flag1
, it will not
enter the while
loop, and will instead enter the critical
section, but that is OK since p1
has entered
the while
loop.
Though flag0
is not set to false
until p0
has finished the critical section, we need to
ensure that p1
does not see this until the values
modified in the critical section are also visible to p1
,
and this is the purpose of the release fence prior to the store
to flag0
and the acquire fence after
the while
loop. When p1
reads the
value false
from
flag0
in order to exit the while
loop, it
must be reading the value store by p0
after the release
fence at the end of the critical section. The acquire fence after the
load guarantees that all the values written before the release fence
prior to the store are visible, which is exactly what we need
here.
If p0
reads true
for flag1
,
it will enter the while
loop rather than the critical
section. Both threads are now looping, so we need a way to ensure
that exactly one of them breaks out. This is the purpose of
the turn
variable. Initially, turn is 0,
so p1
will enter the if
and
set flag1
to false
, whilst p1
will not enter the if
. Because p1
set
flag1
to false
, eventually p0
will read flag1
as false
and exit the
outer while
loop to enter the critical section. On the
other hand, p1
is now stuck in the
inner while
loop because turn
is
0. When p0
exits the critical section it
sets turn
to 1. This will eventually be seen
by p1
, allowing it to exit the inner while
loop. When the store to
flag0
becomes visible p1
can then exit the
outer while
loop too.
If turn
had been 1 initially (because p0
was the last thread to enter the critical section) then the inverse
logic would apply, and p0
would enter the inner loop,
allowing p1
to enter the critical section first.
Second time around
If p0
is called a second time whilst p1
is still in the inner loop then we have a similar situation to the
start of the function — p1
may exit the inner
loop and store true
in flag1
whilst p0
stores true
in flag0
. We therefore need the
second memory_order_seq_cst
fence after the store to
the flag in the inner loop. This guarantees that
at least one of the threads will see the flag from the other thread
set when it executes the check in the outer loop. Without this fence
then both threads can read false
, and both can enter
the critical section.
Alternatives
You could put the ordering constraints on the loads and stores
themselves rather than using fences. Indeed, the default memory
ordering for atomic operations in C++
is memory_order_seq_cst
, so the algorithm would "just
work" with plain loads and stores to atomic variables. However, by
using memory_order_relaxed
on the loads and stores we
can add fences to ensure we have exactly the ordering constraints
required.
Posted by Anthony Williams
[/ threading /] permanent link
Tags: concurrency, synchronization, Dekker, fences, cplusplus
Stumble It! | Submit to Reddit
| Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Reference Wrappers Explained
Wednesday, 14 July 2010
The
upcoming C++0x
standard includes reference wrappers in the form of
the std::reference_wrapper<T>
class template, and
the helper function templates std::ref()
and std::cref()
. As I mentioned in my blog post
on Starting
Threads with Member Functions and Reference Arguments, these
wrappers can be used to pass references to objects across interfaces
that normally require copyable (or at least movable) objects
— in that blog post, std::ref
was used for passing
references to objects over to the new thread, rather than copying the
objects. I was recently asked what the difference was
between std::ref
and std::cref
, and how they
worked, so I thought I'd elaborate.
Deducing the Referenced Type
std::ref
is a function template, so automatically
deduces the type of the wrapped reference from the type of the
supplied argument. This type deduction includes
the const
-ness of the supplied object:
int x=3; const int y=4; std::reference_wrapper<int> rx=std::ref(x); // std::reference_wrapper<int> ry=std::ref(y); // error std::reference_wrapper<const int> rcy=std::ref(y);
On the other hand, though std::cref
also deduces the
type of the wrapped reference from the supplied argument,
it always wraps a const
reference:
int x=3; const int y=4; // std::reference_wrapper<int> rx=std::cref(x); // error std::reference_wrapper<const int> rcx=std::cref(x); // std::reference_wrapper<int> ry=std::cref(y); // error std::reference_wrapper<const int> rcy=std::cref(y);
Since a no-const
-reference can always be bound to
a const
reference, you can thus
use std::ref
in pretty much every case where you would
use std::cref
, and your code would work the same. Which
begs the question: why would you ever choose to
use std::cref
?
Using std::cref
to prevent modification
The primary reason for choosing std::cref
is because
you want to guarantee that the source object is not modified through
that reference. This can be important when writing multithreaded
code — if a thread should not be modifying some data then it
can be worth enforcing this by passing a const
reference rather than a mutable reference.
void foo(int&); // mutable reference int x=42; // Should not be modified by thread std::thread t(foo,std::cref(x)); // will fail to compile
This can be important where there are overloads of a function such
that one takes a const
reference, and the other a
non-const
reference: if we don't want the object
modified then it is important that the overload taking
a const
reference is chosen.
struct Foo { void operator()(int&) const; void operator()(int const&) const; }; int x=42; std::thread(Foo(),std::cref(x)); // force const int& overload
References to temporaries
std::cref
has another property missing
from std::ref
— it can bind to temporaries, since
temporary objects can bind to const
references. I'm
not sure this is a good thing, especially when dealing with multiple
threads, as the referenced temporary is likely to have been
destroyed before the thread has even started. This is therefore
something to watch out for:
void bar(int const&); std::thread t(bar,std::cref(42)); // oops, ref to temporary
Documentation
Finally, std::cref
serves a documentation purpose,
even where std::ref
would suffice — it declares
in big bold letters that this reference cannot be used to modify the
referenced object, which thus makes it easier to reason about the
code.
Recommendation
I would recommend that you use std::cref
in preference
to std::ref
whenever you can — the benefits as
documentation of intent, and avoiding accidental modification
through the reference make it a clear winner in my opinion. Of
course, if you do want to modify the referenced
object, then you need to use std::ref
, but such usage
now stands out, and makes it clear that this is the intent.
You do still need to be careful to ensure that you don't try and
wrap references to temporaries, particularly when
applying std::cref
to the result of a function call,
but such uses should stand out — I expect most uses to be
wrapping a reference to a named variable rather than wrapping a
function result.
Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: reference wrappers, ref, cref, cplusplus
Stumble It! | Submit to Reddit
| Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Design and Content Copyright © 2005-2025 Just Software Solutions Ltd. All rights reserved. | Privacy Policy