Blog Archive
Coming Soon: Just::Thread Pro
Friday, 29 October 2010
Multithreaded code doesn't have to be complicated.
That's the idea behind the Just::Thread Pro library. By providing a set of high level facilities in the library, your application code can be simplified — rather than spending your time on the complexities of multithreading and concurrency you can instead focus on what it is your application is trying to achieve.
Building on the Just::Thread C++0x thread library, Just::Thread Pro will provide facilities to:
- Encapsulate communication between threads to avoid deadlocks and race conditions
- Easily scale your application to make use of multi-core processors
- Parallelize existing single-threaded code without a major rewrite
Just::Thread Pro will be available for all platforms supported by Just::Thread.
Head over to the Just::Thread Pro website and sign up to receive further news about the library and notification when it is released.
Posted by Anthony Williams
[/ news /] permanent link
Tags: concurrency, cplusplus, multithreading
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
October 2010 C++ Standards Committee Mailing
Thursday, 21 October 2010
The October 2010 mailing for the C++ Standards Committee was published earlier this week. This is the pre-meeting mailing for the November 2010 committee meeting.
As well as the usual core and library issues lists, this mailing also includes a Summary of the status of the FCD comments, along with a whole host of papers attempting to address some the remaining FCD comments.
To move or not to move
The big issue of the upcoming meeting is looking to be whether or not the compiler should implicitly generate move constructors and move assignment operators akin to the copy constructors and copy assignment operators that are currently auto generated. The wording in the FCD requires this, but people are concerned that this will break existing code when people start using their code with a C++0x compiler and library. There are two papers on the subject in the mailing: N3153: Implicit Move Must Go by Dave Abrahams, and N3174: To move or not to move by Bjarne Stroustrup.
There seems to be consensus among committee members that the FCD requires compilers to generate the move constructor and move assignment operator in cases that will break existing code. The key question is whether the breakage can be limited by restricting the cases in which the move members are implicitly generated, or whether implicit generation should be abandoned altogether. The various options are explained very clearly in the papers.
Exceptions and Destructors
N3166: Destructors default to noexcept is another potentially controversial issue. It is generally acknowledged that throwing exceptions from destructors is a bad idea, not least because this leads to termination if the destructor is invoked whilst the stack is being unwound due to another exception. Herb Sutter wrote about this way back in 1998 when the original C++ standard was hot off the presses, in GotW #47: Uncaught Exceptions.
The proposal in the paper comes from a Finnish comment on the FCD,
and is quite simple: by default all destructors are assumed to be
marked noexcept(true)
(which is the new way of saying
they cannot throw an exception, similar to an exception specification
of throw()
), unless they explicitly have a non-empty
exception specification or are marked
noexcept(false)
.
Since it is generally good practice not to throw from a destructor,
you'd think this would be uncontroversial. Unfortunately it is not the
case — there are currently situations where throwing from a
destructor has defined behaviour, and even does exactly what people
want. The example most frequently cited is the SOCI project for accessing
databases from C++. This library provides an easy syntax for
constructing SQL queries using the <<
operator. The
operator builds a temporary object which executes the SQL in the
destructor. If the SQL is invalid, or executing it causes an exception
for any other reason then the destructor throws. Changing destructors
to be noexcept(true)
by default will make such code
terminate on a database error unless the destructor is updated to
declare that it can throw exceptions. Working code with defined
behaviour is thus broken when recompiled with a C++0x compiler.
Concurrency-related papers
There are 3 concurrency-related papers in this mailing, which I've summarised below.
- N3152: Progress guarantees for C++0x (US 3 and US 186)
The FCD does not make any progress guarantees when multiple threads are used. In particular, writes made by one thread do not ever have to become visible to other threads, and threads aren't guaranteed ever to actually run at all. This paper looks at the issues and provides wording for minimal guarantees.
- N3164: Adjusting C++ Atomics for C Compatibility
This is an update to N3137 from the last mailing, which provides detailed wording updates for the required changes to regain compatibility with C1X atomics.
- N3170: Clarifying C++ Futures
There were a few FCD comments from the US about the use of futures; this paper outlines all the issues and potential solutions. The proposed changes are actually fairly minor though:
-
future
gains ashare()
member function for easy conversion to the correspondingshared_future
type; - Accessing a
shared_future
for whichvalid()
isfalse
is now required to throw an exception rather than be undefined behaviour; atomic_future
is to be removed;
A few minor changes have also been made to the wording to make things clearer.
-
If you have any opinions on any of the papers listed here, or the resolution of any NB comments, please add them to the comments for this post.
Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: C++0x, C++, standards, concurrency
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
just::thread C++0x Thread Library V1.4.2 Released
Friday, 15 October 2010
I am pleased to announce that version 1.4.2
of just::thread
,
our C++0x Thread Library
has just been released.
The big change with this release is the new support for gcc 4.5 on Ubuntu Linux. If you're running Ubuntu Lucid then you can get the .DEB files for gcc 4.5 from yesterday's blog post. For Ubuntu Maverick, gcc 4.5 is in the repositories.
Other changes:
- Overflow in ratio arithmetic will now cause a compilation failure
- Ratio arithmetic operations derive from the resulting
std::ratio
instantiation as well as providing the::type
member to better emulate the C++0x working draft - On Windows,
just::thread
can now be used in MFC DLLs
As usual, existing customers are entitled to a free upgrade to V1.4.2 from all earlier versions.
Posted by Anthony Williams
[/ news /] permanent link
Tags: multithreading, concurrency, C++0x
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
gcc 4.5 Packages for Ubuntu Lucid
Thursday, 14 October 2010
Ubuntu Maverick was released earlier this week. Amongst other things, gcc 4.5 is available in the repositories, whereas for previous versions you had to build it yourself from source.
In order to save you the pain of compiling gcc 4.5 for yourself (which can take a while, and overheated my laptop when I tried), I've built it for Ubuntu Lucid, and uploaded the .deb files to my website. The .debs are built from the Maverick source packages for gcc 4.5.1, binutils 2.20.51, cloog-ppl and mpclib, and I've built them for both i386 and amd64 architectures.
- binutils_2.20.51.20100908-0ubuntu2_amd64.deb
- binutils_2.20.51.20100908-0ubuntu2_i386.deb
- cpp-4.5_4.5.1-7ubuntu2_amd64.deb
- cpp-4.5_4.5.1-7ubuntu2_i386.deb
- g++-4.5_4.5.1-7ubuntu2_amd64.deb
- g++-4.5_4.5.1-7ubuntu2_i386.deb
- gcc-4.5-base_4.5.1-7ubuntu2_amd64.deb
- gcc-4.5-base_4.5.1-7ubuntu2_i386.deb
- gcc-4.5_4.5.1-7ubuntu2_amd64.deb
- gcc-4.5_4.5.1-7ubuntu2_i386.deb
- libcloog-ppl0_0.15.9-2_amd64.deb
- libcloog-ppl0_0.15.9-2_i386.deb
- libgcc1_4.5.1-7ubuntu2_amd64.deb
- libgcc1_4.5.1-7ubuntu2_i386.deb
- libgomp1_4.5.1-7ubuntu2_amd64.deb
- libgomp1_4.5.1-7ubuntu2_i386.deb
- libmpc2_0.8.2-1build1_amd64.deb
- libmpc2_0.8.2-1build1_i386.deb
- libstdc++6-4.5-dev_4.5.1-7ubuntu2_amd64.deb
- libstdc++6-4.5-dev_4.5.1-7ubuntu2_i386.deb
- libstdc++6_4.5.1-7ubuntu2_amd64.deb
- libstdc++6_4.5.1-7ubuntu2_i386.deb
Enjoy!
Posted by Anthony Williams
[/ news /] permanent link
Tags: gcc, lucid, ubuntu
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Concept Checking Without C++0x Concepts
Wednesday, 06 October 2010
My latest article, Concept Checking Without Concepts in C++ was published on the Dr Dobb's website a couple of weeks ago.
One of the important features of the now-defunct C++0x Concepts proposal was the ability to overload functions based on whether or not their arguments met certain concepts. This article describes a way to allow that for concepts based on the presence of particular member functions.
The basic idea is that you can write traits classes that detect
particular sets of member functions. Function overloads that require
these concepts can then be enabled or disabled by
using std::enable_if
with these traits.
The example I use is checking for a Lockable type which
has lock()
, unlock()
and try_lock()
member functions, but the same technique
could easily be used for other concepts that required other member
functions.
Read the article for the full details.
Posted by Anthony Williams
[/ news /] permanent link
Tags: concepts, cplusplus
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
August 2010 C++ Standards Committee Mailing
Wednesday, 08 September 2010
The August 2010 mailing for the C++ Standards Committee was published recently. This is the post-meeting mailing for the August 2010 committee meeting, and contains a new C++0x Working Draft. At the meeting in August, the committee discussed many of the National Body comments on the FCD, and this draft incorporates those changes that the committee approved of. As you can see from the FCD Comment Status document in this mailing, there were 301 technical comments and a further 215 editorial comments. Of these, 98 technical comments have been accepted as-is, 8 have been accepted with changes, and 63 have been rejected, leaving 132 technical comments that have still not been addressed one way or the other.
No significant changes have been accepted to the concurrency-related parts of the working draft, though there are quite a few editorial comments. However, there are several papers in this mailing that address the National Body comments in this area. These papers have by and large been drafted to represent the consensus of those members of the concurreny group in the LWG who were present at the meeting. I have summarised these papers below.
Concurrency-related papers
- N3113: Async Launch Policies (CH 36)
This paper provides a clearer basis for implementors to supply additional launch policies for
std::async
, or for the committee to do so in a later revision of the C++ standard, by making thestd::launch
enum a bitmask type. It also drops thestd::launch::any
enumeration value, and renamesstd::launch::sync
tostd::launch::deferred
, as this better describes what it means.The use of a bitmask allows new values to be added which are either distinct values, or combinations of the others. The default policy for
std::async
is thusstd::launch::async|std::launch::deferred
.- N3125: Omnibus Memory Model and Atomics Paper
This paper addresses several National Body comments by updating the wording in the draft standard to better reflect the intent of the committee.
- N3128: C++ Timeout Specification
There are several functions in the threading portion of the library that allow timeouts, such as the
try_lock_for
andtry_lock_until
member functions of the timed mutex types, and thewait_for
andwait_until
member functions of the future types. This paper clarifies what it means to wait for a specified duration (with the xxx_for
functions), and what it means to wait until a specified time point (with the xxx_until
functions). In particular, it clarifies what can be expected of the implementation if the clock is changed during a wait.This paper also proposes replacing the old
std::chrono::monotonic_clock
with a newstd::chrono::steady_clock
. Whereas the only constraint on the monotonic clock was that it never went backwards, the steady clock cannot be adjusted, and always ticks at a uniform rate. This fulfils the original intent of the monotonic clock, but provides a clearer specification and name. It is also tied into the new wait specifications, since waiting for a duration requires a steady clock for use as a basis.- N3129: Managing C++ Associated Asynchronous State
This paper tidies up the wording of the functions and classes related to the future types, and clarifies the management of the associated asynchronous state which is used to communicate e.g. between a
std::promise
and astd::future
that will receive the result.- N3130: Lockable requirements for C++0x
This paper splits out the requirements for general lockable types away from the specific requirements on the standard mutex types. This allows the lockable concepts to be used to specify the requirements on a type to be used the the
std::lock_guard
andstd::unique_lock
class templates, as well as for the various overloads of the wait functions onstd::condition_variable_any
, without imposing the precise behaviour ofstd::mutex
on user-defined mutex types.- N3132: Mathematizing C++ Concurrency: The Post-Rapperswil Model
This paper provides a mathematical description for the C++0x memory model. A similar description was used to highlight some of the areas that are clarified by the omnibus memory model paper (N3125) described above.
- N3136: Coherence Requirements Detailed
This paper introduces some simple coherence requirements to the memory model wording to make it clear that the sequence of values read for a given variable must be consistent across threads. The existence of a single modification order for each variable is a key component of the memory model, and the wording introduced in this paper makes it clear that this is a core requirement.
- N3137: C and C++ Liaison: Compatibility for Atomics
The structure of the atomic types and operations in the FCD was carefully worked out in conjunction with the C standards committee to ensure that the C++0x atomic types were compatible with those being introduced in the upcoming C1x standard. Unfortunately, the C committee introduced a new incompatible syntax for atomic types into the C1x draft earlier this year because they believed it was a better match for the C language.
This paper attempts to address this new incompatibility by removing the
atomic_
xxx types that were originally added for C compatibility, leaving just thestd::atomic<T>
class template. Also, a new_Atomic(T)
macro is introduced for compatibility with the new C1x_Atomic
keyword.
Other papers
As already mentioned, this mailing contains a new C++0x Working Draft, along with the usual post-meeting stuff — editors notes for the changes in the new draft, new issues lists, minutes of the meeting, etc. It also contains a complete list of the National Body Comments on the FCD, and a few other papers addressing National Body comments.
If you have any opinions on the resolution of any NB comments not yet formally accepted or rejected, please add them to the comments for this post.
Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: C++0x, C++, standards, concurrency
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Definitions of Non-blocking, Lock-free and Wait-free
Tuesday, 07 September 2010
There have repeatedly been posts on comp.programming.threads asking for a definition of these terms. To write good multithreaded code you really need to understand what these mean, and how they affect the behaviour and performance of algorithms with these properties. I thought it would be therefore be useful to provide some definitions.
Definition of Blocking
A function is said to be blocking if it calls an operating system function that waits for an event to occur or a time period to elapse. Whilst a blocking call is waiting the operating system can often remove that thread from the scheduler, so it takes no CPU time until the event has occurred or the time has elapsed. Once the event has occurred then the thread is placed back in the scheduler and can run when allocated a time slice. A thread that is running a blocking call is said to be blocked.
Mutex lock functions such
as std::mutex::lock()
,
and EnterCriticalSection()
are blocking, as are wait functions such
as std::future::wait()
and std::condition_variable::wait()
. However,
blocking functions are not limited to synchronization facilities:
the most common blocking functions are I/O facilities such
as fread()
or WriteFile()
. Timing
facilities such
as Sleep()
,
or std::this_thread::sleep_until()
are also often blocking if the delay period is long enough.
Definition of Non-blocking
Non-blocking functions are just those that aren't blocking. Non-blocking data structures are those on which all operations are non-blocking. All lock-free data structures are inherently non-blocking.
Spin-locks are an example of non-blocking synchronization: if one thread has a lock then waiting threads are not suspended, but must instead loop until the thread that holds the lock has released it. Spin locks and other algorithms with busy-wait loops are not lock-free, because if one thread (the one holding the lock) is suspended then no thread can make progress.
Defintion of lock-free
A lock-free data structure is one that doesn't use any
mutex locks. The implication is that multiple threads can access the
data structure concurrently without race conditions or data
corruption, even though there are no locks — people would give
you funny looks if you suggested that std::list
was a
lock-free data structure, even though it is unlikely that there are
any locks used in the implementation.
Just because more than one thread can safely access a lock-free data structure concurrently doesn't mean that there are no restrictions on such accesses. For example, a lock-free queue might allow one thread to add values to the back whilst another removes them from the front, whereas multiple threads adding new values concurrently would potentially corrupt the data structure. The data structure description will identify which combinations of operations can safely be called concurrently.
For a data structure to qualify as lock-free, if any thread performing an operation on the data structure is suspended at any point during that operation then the other threads accessing the data structure must still be able to complete their tasks. This is the fundamental restriction which distinguishes it from non-blocking data structures that use spin-locks or other busy-wait mechanisms.
Just because a data structure is lock-free it doesn't mean that threads don't have to wait for each other. If an operation takes more than one step then a thread may be pre-empted by the OS part-way through an operation. When it resumes the state may have changed, and the thread may have to restart the operation.
In some cases, a the partially-completed operation would prevent other threads performing their desired operations on the data structure until the operation is complete. In order for the algorithm to be lock-free, these threads must then either abort or complete the partially-completed operation of the suspended thread. When the suspended thread is woken by the scheduler it can then either retry or accept the completion of its operation as appropriate. In lock-free algorithms, a thread may find that it has to retry its operation an unbounded number of times when there is high contention.
If you use a lock-free data structure where multiple threads modify the same pieces of data and thus cause each other to retry then high rates of access from multiple threads can seriously cripple the performance, as the threads hinder each other's progress. This is why wait-free data structures are so important: they don't suffer from the same set-backs.
Definition of wait-free
A wait-free data structure is a lock-free data structure with the additional property that every thread accessing the data structure can make complete its operation within a bounded number of steps, regardless of the behaviour of other threads. Algorithms that can involve an unbounded number of retries due to clashes with other threads are thus not wait-free.
This property means that high-priority threads accessing the data structure never have to wait for low-priority threads to complete their operations on the data structure, and every thread will always be able to make progress when it is scheduled to run by the OS. For real-time or semi-real-time systems this can be an essential property, as the indefinite wait-periods of blocking or non-wait-free lock-free data structures do not allow their use within time-limited operations.
The downside of wait-free data structures is that they are more complex than their non-wait-free counterparts. This imposes an overhead on each operation, potentially making the average time taken to perform an operation considerably longer than the same operation on an equivalent non-wait-free data structure.
Choices
When choosing a data structure for a given task you need to think about the costs and benefits of each of the options.
A lock-based data structure is probably the easiest to use, reason about and write, but has the potential for limited concurrency. They may also be the fastest in low-load scenarios.
A lock-free (but not wait-free) data structure has the potential to allow more concurrent accesses, but with the possibility of busy-waits under high loads. Lock-free data structures are considerably harder to write, and the additional concurrency can make reasoning about the program behaviour harder. They may be faster than lock-based data structures, but not necessarily.
Finally, a wait-free data structure has the maximum potential for true concurrent access, without the possibility of busy waits. However, these are very much harder to write than other lock-free data structures, and typically impose an additional performance cost on every access.
Posted by Anthony Williams
[/ threading /] permanent link
Tags: concurrency, threading, multithreading, lock-free, wait-free
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
just::thread C++0x Thread Library V1.4.1 Released
Monday, 09 August 2010
I am pleased to announce that version 1.4.1
of just::thread
,
our C++0x Thread Library
has just been released.
Thisis an improvement over V1.4.0 in a number of areas:
- Both /Zc:wchar_t and /Zc:wchar_t- are supported with MSVC
-
std::chrono::high_resolution_clock
typedef added - Added support for shared libraries on Linux
- Faster mutex locking and unlocking on contended mutexes on Linux
- Faster blocking/unblocking for condition variables on Linux
- Support for tracking clock changes when waiting on
a
std::chrono::system_clock
time withstd::condition_variable
on Linux with kernels >= 2.6.31 - Support for floating-point durations
- Faster time retrieval with
std::chrono::monotonic_clock::now()
on Windows - Added support for Microsoft Visual Studio 2005
As usual, existing customers are entitled to a free upgrade to V1.4.1 from all earlier versions.
Posted by Anthony Williams
[/ news /] permanent link
Tags: multithreading, concurrency, C++0x
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Implementing Dekker's algorithm with Fences
Tuesday, 27 July 2010
Dekker's algorithm is one of the most basic algorithms for mutual exclusion, alongside Peterson's algorithm and Lamport's bakery algorithm. It has the nice property that it only requires load and store operations rather than exchange or test-and-set, but it still requires some level of ordering between the operations. On a weakly-ordered architecture such as PowerPC or SPARC, a correct implementation of Dekker's algorithm thus requires the use of fences or memory barriers in order to ensure correct operation.
The code
For those of you who just want the code: here it is — Dekker's algorithm in C++, with explicit fences.
std::atomic<bool> flag0(false),flag1(false); std::atomic<int> turn(0); void p0() { flag0.store(true,std::memory_order_relaxed); std::atomic_thread_fence(std::memory_order_seq_cst); while (flag1.load(std::memory_order_relaxed)) { if (turn.load(std::memory_order_relaxed) != 0) { flag0.store(false,std::memory_order_relaxed); while (turn.load(std::memory_order_relaxed) != 0) { } flag0.store(true,std::memory_order_relaxed); std::atomic_thread_fence(std::memory_order_seq_cst); } } std::atomic_thread_fence(std::memory_order_acquire); // critical section turn.store(1,std::memory_order_relaxed); std::atomic_thread_fence(std::memory_order_release); flag0.store(false,std::memory_order_relaxed); } void p1() { flag1.store(true,std::memory_order_relaxed); std::atomic_thread_fence(std::memory_order_seq_cst); while (flag0.load(std::memory_order_relaxed)) { if (turn.load(std::memory_order_relaxed) != 1) { flag1.store(false,std::memory_order_relaxed); while (turn.load(std::memory_order_relaxed) != 1) { } flag1.store(true,std::memory_order_relaxed); std::atomic_thread_fence(std::memory_order_seq_cst); } } std::atomic_thread_fence(std::memory_order_acquire); // critical section turn.store(0,std::memory_order_relaxed); std::atomic_thread_fence(std::memory_order_release); flag1.store(false,std::memory_order_relaxed); }
The analysis
If you're like me then you'll be interested in why stuff works, rather than just taking the code. Here is my analysis of the required orderings, and how the fences guarantee those orderings.
Suppose thread 0 and thread 1 enter p0
and p1
respectively at the same time. They both set their
respective flags to true
, execute the fence and then read
the other flag at the start of the while
loop.
If both threads read false
then both will enter the
critical section, and the algorithm doesn't work. It is the job of the
fences to ensure that this doesn't happen.
The fences are marked with memory_order_seq_cst
, so either the
fence in p0
is before the fence in p1
in the global ordering of
memory_order_seq_cst
operations, or vice-versa. Without
loss of generality, we can assume that the fence in p0
comes before the fence in p1
, since the code is
symmetric. The store to flag0
is sequenced before the
fence in p0
, and the fence in p1
is
sequenced before the read from flag0
. Therefore the
read from flag0
must see the value stored
(true
), so p1
will enter
the while
loop.
On the other side, there is no such guarantee for the read
from flag1
in p0
, so p0
may or
may not enter the while
loop. If p0
reads
the value of false
for flag1
, it will not
enter the while
loop, and will instead enter the critical
section, but that is OK since p1
has entered
the while
loop.
Though flag0
is not set to false
until p0
has finished the critical section, we need to
ensure that p1
does not see this until the values
modified in the critical section are also visible to p1
,
and this is the purpose of the release fence prior to the store
to flag0
and the acquire fence after
the while
loop. When p1
reads the
value false
from
flag0
in order to exit the while
loop, it
must be reading the value store by p0
after the release
fence at the end of the critical section. The acquire fence after the
load guarantees that all the values written before the release fence
prior to the store are visible, which is exactly what we need
here.
If p0
reads true
for flag1
,
it will enter the while
loop rather than the critical
section. Both threads are now looping, so we need a way to ensure
that exactly one of them breaks out. This is the purpose of
the turn
variable. Initially, turn is 0,
so p1
will enter the if
and
set flag1
to false
, whilst p1
will not enter the if
. Because p1
set
flag1
to false
, eventually p0
will read flag1
as false
and exit the
outer while
loop to enter the critical section. On the
other hand, p1
is now stuck in the
inner while
loop because turn
is
0. When p0
exits the critical section it
sets turn
to 1. This will eventually be seen
by p1
, allowing it to exit the inner while
loop. When the store to
flag0
becomes visible p1
can then exit the
outer while
loop too.
If turn
had been 1 initially (because p0
was the last thread to enter the critical section) then the inverse
logic would apply, and p0
would enter the inner loop,
allowing p1
to enter the critical section first.
Second time around
If p0
is called a second time whilst p1
is still in the inner loop then we have a similar situation to the
start of the function — p1
may exit the inner
loop and store true
in flag1
whilst p0
stores true
in flag0
. We therefore need the
second memory_order_seq_cst
fence after the store to
the flag in the inner loop. This guarantees that
at least one of the threads will see the flag from the other thread
set when it executes the check in the outer loop. Without this fence
then both threads can read false
, and both can enter
the critical section.
Alternatives
You could put the ordering constraints on the loads and stores
themselves rather than using fences. Indeed, the default memory
ordering for atomic operations in C++
is memory_order_seq_cst
, so the algorithm would "just
work" with plain loads and stores to atomic variables. However, by
using memory_order_relaxed
on the loads and stores we
can add fences to ensure we have exactly the ordering constraints
required.
Posted by Anthony Williams
[/ threading /] permanent link
Tags: concurrency, synchronization, Dekker, fences, cplusplus
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Reference Wrappers Explained
Wednesday, 14 July 2010
The
upcoming C++0x
standard includes reference wrappers in the form of
the std::reference_wrapper<T>
class template, and
the helper function templates std::ref()
and std::cref()
. As I mentioned in my blog post
on Starting
Threads with Member Functions and Reference Arguments, these
wrappers can be used to pass references to objects across interfaces
that normally require copyable (or at least movable) objects
— in that blog post, std::ref
was used for passing
references to objects over to the new thread, rather than copying the
objects. I was recently asked what the difference was
between std::ref
and std::cref
, and how they
worked, so I thought I'd elaborate.
Deducing the Referenced Type
std::ref
is a function template, so automatically
deduces the type of the wrapped reference from the type of the
supplied argument. This type deduction includes
the const
-ness of the supplied object:
int x=3; const int y=4; std::reference_wrapper<int> rx=std::ref(x); // std::reference_wrapper<int> ry=std::ref(y); // error std::reference_wrapper<const int> rcy=std::ref(y);
On the other hand, though std::cref
also deduces the
type of the wrapped reference from the supplied argument,
it always wraps a const
reference:
int x=3; const int y=4; // std::reference_wrapper<int> rx=std::cref(x); // error std::reference_wrapper<const int> rcx=std::cref(x); // std::reference_wrapper<int> ry=std::cref(y); // error std::reference_wrapper<const int> rcy=std::cref(y);
Since a no-const
-reference can always be bound to
a const
reference, you can thus
use std::ref
in pretty much every case where you would
use std::cref
, and your code would work the same. Which
begs the question: why would you ever choose to
use std::cref
?
Using std::cref
to prevent modification
The primary reason for choosing std::cref
is because
you want to guarantee that the source object is not modified through
that reference. This can be important when writing multithreaded
code — if a thread should not be modifying some data then it
can be worth enforcing this by passing a const
reference rather than a mutable reference.
void foo(int&); // mutable reference int x=42; // Should not be modified by thread std::thread t(foo,std::cref(x)); // will fail to compile
This can be important where there are overloads of a function such
that one takes a const
reference, and the other a
non-const
reference: if we don't want the object
modified then it is important that the overload taking
a const
reference is chosen.
struct Foo { void operator()(int&) const; void operator()(int const&) const; }; int x=42; std::thread(Foo(),std::cref(x)); // force const int& overload
References to temporaries
std::cref
has another property missing
from std::ref
— it can bind to temporaries, since
temporary objects can bind to const
references. I'm
not sure this is a good thing, especially when dealing with multiple
threads, as the referenced temporary is likely to have been
destroyed before the thread has even started. This is therefore
something to watch out for:
void bar(int const&); std::thread t(bar,std::cref(42)); // oops, ref to temporary
Documentation
Finally, std::cref
serves a documentation purpose,
even where std::ref
would suffice — it declares
in big bold letters that this reference cannot be used to modify the
referenced object, which thus makes it easier to reason about the
code.
Recommendation
I would recommend that you use std::cref
in preference
to std::ref
whenever you can — the benefits as
documentation of intent, and avoiding accidental modification
through the reference make it a clear winner in my opinion. Of
course, if you do want to modify the referenced
object, then you need to use std::ref
, but such usage
now stands out, and makes it clear that this is the intent.
You do still need to be careful to ensure that you don't try and
wrap references to temporaries, particularly when
applying std::cref
to the result of a function call,
but such uses should stand out — I expect most uses to be
wrapping a reference to a named variable rather than wrapping a
function result.
Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: reference wrappers, ref, cref, cplusplus
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Design and Content Copyright © 2005-2025 Just Software Solutions Ltd. All rights reserved. | Privacy Policy