Blog Archive
Multithreading in C++0x part 8: Futures, Promises and Asynchronous Function Calls
Thursday, 11 February 2010
This is the eighth in a series of blog posts introducing the new C++0x thread library. See the end of this article for a full set of links to the rest of the series.
In this installment we'll take a look at the "futures" mechanism from C++0x. Futures are a high level mechanism for passing a value between threads, and allow a thread to wait for a result to be available without having to manage the locks directly.
Futures and asynchronous function calls
The most basic use of a future is to hold the result of
a call to the new std::async
function for running some code
asynchronously:
#include <future> #include <iostream> int calculate_the_answer_to_LtUaE(); void do_stuff(); int main() { std::future<int> the_answer=std::async(calculate_the_answer_to_LtUaE); do_stuff(); std::cout<<"The answer to life, the universe and everything is " <<the_answer.get()<<std::endl; }
The call to std::async
takes care of creating a thread, and
invoking calculate_the_answer_to_LtUaE
on that
thread. The main thread can then get on with
calling do_stuff()
whilst the immensely time consuming
process of calculating the ultimate answer is done in the
background. Finally, the call to the get()
member
function of the std::future<int>
object then
waits for the function to complete and ensures that the necessary
synchronization is applied to transfer the value over so the main
thread can print "42".
Sometimes asynchronous functions aren't really asynchronous
Though I said that std::async
takes care of creating a
thread, that's not necessarily true. As well as the function being
called, std::async
takes a launch policy which
specifies whether to start a new thread or create a "deferred
function" which is only run when you wait for it. The default launch
policy for std::async
is std::launch::any
,
which means that the implementation gets to choose for you. If you
really want to ensure that your function is run on its own thread
then you need to specify the std::launch::async
policy:
std::future<int> the_answer=std::async(std::launch::async,calculate_the_answer_to_LtUaE);
Likewise, if you really want the function to be executed in
the get()
call then you can specify
the std::launch::sync
policy:
std::future<int> the_answer=std::async(std::launch::sync,calculate_the_answer_to_LtUaE);
In most cases it makes sense to let the library choose. That way you'll avoid creating too many threads and overloading the machine, whilst taking advantage of the available hardware threads. If you need fine control, you're probably better off managing your own threads.
Divide and Conquer
std::async
can be used to easily parallelize simple algorithms. For example,
you can write a parallel version of for_each
as
follows:
template<typename Iterator,typename Func> void parallel_for_each(Iterator first,Iterator last,Func f) { ptrdiff_t const range_length=last-first; if(!range_length) return; if(range_length==1) { f(*first); return; } Iterator const mid=first+(range_length/2); std::future<void> bgtask=std::async(¶llel_for_each<Iterator,Func>, first,mid,f); try { parallel_for_each(mid,last,f); } catch(...) { bgtask.wait(); throw; } bgtask.get(); }
This simple bit of code recursively divides up the range into
smaller and smaller pieces. Obviously an empty range doesn't require
anything to happen, and a single-point range just requires
calling f
on the one and only value. For bigger ranges
then an asynchronous task is spawned to handle the first half, and
then the second half is handled by a recursive
call.
The try
- catch
block just ensures that
the asynchronous task is finished before we leave the function even
if an exception in order to avoid the background tasks potentially
accessing the range after it has been destroyed. Finally,
the get()
call waits for the background task, and propagates any
exception thrown from the background task. That way if an exception
is thrown during any of the processing then the calling code will
see an exception. Of course if more than one exception is thrown
then some will get swallowed, but C++ can only handle one exception
at a time, so that's the best that can be done without using a
custom composite_exception
class to collect them
all.
Many algorithms can be readily parallelized this way, though you may want to have more than one element as the minimum range in order to avoid the overhead of spawning the asynchronous tasks.
Promises
An alternative to
using std::async
to spawn the task and return the future is to manage the threads
yourself and use
the std::promise
class template to provide the
future. Promises provide a basic mechanism for transferring values
between threads:
each std::promise
object is associated with a
single std::future
object. A thread with access to
the std::future
object can use wait for the result to be set, whilst another thread
that has access to the
corresponding std::promise
object can
call set_value()
to store the value and make the future ready. This works
well if the thread has more than one task to do, as information can
be made ready to other threads as it becomes available rather than
all of them having to wait until the thread doing the work has
completed. It also allows for situations where multiple threads
could produce the answer: from the point of view of the waiting
thread it doesn't matter where the answer came from, just that it is
there so it makes sense to have a single future to represent that
availability.
For example, asynchronous I/O could be modelled on a promise/future basis: when you submit an I/O request then the async I/O handler creates a promise/future pair. The future is returned to the caller, which can then wait on the future when it needs the data, and the promise is stored alongside the details of the request. When the request has been fulfilled then the I/O thread can set the value on the promise to pass the value back to the waiting thread before moving on to process additional requests. The following code shows a sample implementation of this pattern.
class aio { class io_request { std::streambuf* is; unsigned read_count; std::promise<std::vector<char> > p; public: explicit io_request(std::streambuf& is_,unsigned count_): is(&is_),read_count(count_) {} io_request(io_request&& other): is(other.is), read_count(other.read_count), p(std::move(other.p)) {} io_request(): is(0),read_count(0) {} std::future<std::vector<char> > get_future() { return p.get_future(); } void process() { try { std::vector<char> buffer(read_count); unsigned amount_read=0; while((amount_read != read_count) && (is->sgetc()!=std::char_traits<char>::eof())) { amount_read+=is->sgetn(&buffer[amount_read],read_count-amount_read); } buffer.resize(amount_read); p.set_value(std::move(buffer)); } catch(...) { p.set_exception(std::current_exception()); } } }; thread_safe_queue<io_request> request_queue; std::atomic_bool done; void io_thread() { while(!done) { io_request req=request_queue.pop(); req.process(); } } std::thread iot; public: aio(): done(false), iot(&aio::io_thread,this) {} std::future<std::vector<char> > queue_read(std::streambuf& is,unsigned count) { io_request req(is,count); std::future<std::vector<char> > f(req.get_future()); request_queue.push(std::move(req)); return f; } ~aio() { done=true; request_queue.push(io_request()); iot.join(); } }; void do_stuff() {} void process_data(std::vector<char> v) { for(unsigned i=0;i<v.size();++i) { std::cout<<v[i]; } std::cout<<std::endl; } int main() { aio async_io; std::filebuf f; f.open("my_file.dat",std::ios::in | std::ios::binary); std::future<std::vector<char> > fv=async_io.queue_read(f,1048576); do_stuff(); process_data(fv.get()); return 0; }
Next Time
The sample code above also demonstrates passing exceptions between
threads using
the set_exception()
member function
of std::promise
. I'll
go into more detail about exceptions in multithreaded next time.
Subscribe to the RSS feed or email newsletter for this blog to be sure you don't miss the rest of the series.
Try it out
If you're using Microsoft Visual Studio 2008 or g++ 4.3 or 4.4 on
Ubuntu Linux you can try out the examples from this series using our
just::thread
implementation of the new C++0x thread library. Get your copy
today.
Multithreading in C++0x Series
Here are the posts in this series so far:
- Multithreading in C++0x Part 1: Starting Threads
- Multithreading in C++0x Part 2: Starting Threads with Function Objects and Arguments
- Multithreading in C++0x Part 3: Starting Threads with Member Functions and Reference Arguments
- Multithreading in C++0x Part 4: Protecting Shared Data
- Multithreading
in C++0x Part 5: Flexible locking
with
std::unique_lock<>
- Multithreading in C++0x part 6: Lazy initialization and double-checked locking with atomics
- Multithreading in C++0x part 7: Locking multiple mutexes without deadlock
- Multithreading in C++0x part 8: Futures, Promises and Asynchronous Function Calls
Posted by Anthony Williams
[/ threading /] permanent link
Tags: concurrency, multithreading, C++0x, thread, future, promise, async
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
just::thread C++0x Thread Library V1.3 Released
Wednesday, 13 January 2010
I am pleased to announce that version 1.3 of just::thread, our C++0x Thread Library has just been released.
This release is the first to feature support for the new std::async
function for starting asynchronous tasks. This provides a higher-level
interface for managing threads than is available with std::thread
,
and allows your code to easily take advantage of the available
hardware concurrency without excessive oversubscription.
This is also the first release to support 64-bit Windows.
The linux port is available for 32-bit and 64-bit Ubuntu linux, and takes full advantage of the C++0x support available from g++ 4.3 and g++ 4.4. The Windows port is available for Microsoft Visual Studio 2008 for both 32-bit and 64-bit Windows. Purchase your copy and get started NOW.
As usual, existing customers are entitled to a free upgrade to V1.3 from all earlier versions.
Posted by Anthony Williams
[/ news /] permanent link
Tags: multithreading, concurrency, C++0x
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Happy New Year 2010
Tuesday, 05 January 2010
It's already five days into 2010, but I'd like to wish you all a Happy New Year!
2009 was a good year for me. Back in January 2009, my
implementation of the C++0x
thread library went on sale, and sales have been growing
steadily since — there's a new version due out any day now,
with support for the new std::async
functions and
64-bit Windows. I
also presented
at the ACCU conference for the second year running and
completed the first draft
of my book.
It's also been a big year for the C++ community. The biggest change
is of course
that "Concepts"
were taken out of the C++0x draft since they were not ready. On
the concurrency front,
the proposal
for the new std::async
functions
was accepted,
std::unique_future
was renamed to
just std::future
and the destructor
of std::thread
was changed to
call std::terminate
rather than detach
if
the thread has not been joined or detached.
Popular articles
As is my custom, here's a list of the 10 most popular articles and blog entries from the Just Software Solutions website in 2009. The key difference from last year's list is the rise of the C++0x thread library stuff.
- Implementing
a Thread-Safe Queue using Condition Variables
A description of the issues around writing a thread-safe queue, with code. - Implementing
drop-down menus in pure CSS (no JavaScript)
How to implement drop-down menus in CSS in a cross-browser fashion (with a teensy bit of JavaScript for IE). - Deadlock
Detection with just::thread
This article describes how to use the special deadlock-detection mode of ourjust::thread
C++0x thread library to locate the cause of deadlocks. - 10
Years of Programming with POSIX Threads
A review of "Programming with POSIX Threads" by David Butenhof, 10 years after publication. - just::thread
C++0x Thread Library V1.0 Released
This is the release announcement for ourjust::thread
C++0x thread library. - Importing
an Existing Windows XP Installation into VirtualBox
This article describes how I recovered the hard disk of a dead laptop to run as a VM under VirtualBox. - Thread
Interruption in the Boost Thread Library
A description of the thread interruption feature of the Boost Thread library. - October
2008 C++ Standards Committee Mailing - New C++0x Working Paper, More
Concurrency Papers Approved
My summary of the October 2008 C++ committee mailing featuring the first feature-complete draft of the C++0x standard. - Multithreading
in C++0x part 1: Starting Threads
This is the first part of my series on the new C++0x thread library. Links to the remaining parts are at the end of the article. - Rvalue
References and Perfect Forwarding in C++0x
An introduction to the new rvalue reference feature of C++0x.
What's coming in 2010?
Will 2010 be even better than 2009? I hope so. There's a new
version
of just::thread
coming soon, and there's another ballot on the C++0x working draft
due in the spring. I'll also be presenting
at ACCU 2010 in April.
What are you looking forward to in 2010?
Posted by Anthony Williams
[/ news /] permanent link
Tags: popular, articles
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
November 2009 C++ Standards Committee Mailing
Tuesday, 17 November 2009
The November 2009 mailing for the C++ Standards Committee was published last week. This is the post-meeting mailing for the October committee meeting and contains a new working draft incorporating all the proposals accepted at the October meeting.
As those of you who
read Herb
Sutter's blog
or Michael
Wong's blog will already know, the big news for concurrency is
that a proposal for std::async
has been
accepted, and this is reflected in the latest working
draft.
There are 5 concurrency-related papers (of which my name is on one), which I summarize below:
Concurrency-related papers
- N2985: C and C++ Thread Compatibility
Lawrence Crowl has been through the concurrency and multithreading support from the C++ working draft, and compared this to that now being added to the C working draft for the next C standard. This paper contains his observations and recommendations about the potential for compatibility between the types and functions provided by the two draft standards. I agree with some of his recommendations, and disagree with others. No action will be taken by the C++ committee unless his recommendations become full proposals.
- N2992: More Collected Issues with Atomics
This is Lawrence Crowl's paper which rounds up various issues with the C++ atomics. It was accepted by the committee, and has been incorporated into the working draft. The most noticeable change is that the atomic types and functions now live in the
<atomic>
header. The macro for identifying lock-free atomics has been expanded into a set of macros, one per type, and a guarantee has been added that all instances of a type are lock-free, or all are not lock-free (rather than it being allowed to vary between objects). There is also a clarification that if you use compare-exchange operations on a type with padding bits then the padding bits will be part of the comparison (which will therefore affectatomic<float>
or similar), plus a few other clarifications.- N2996: A Simple Asynchronous Call
This paper is by Herb Sutter and Lawrence Crowl, and brings together their papers from the pre-meeting mailing (N2970 and N2973) based on feedback from the committee. This is the
std::async
proposal that got accepted, and which has been incorporated into the working draft.The result is that we have a variadic
async
function with an optional launch policy as the first argument, which specifies whether the function is to be spawned on a separate thread, deferred untilget()
orwait()
is called on the returnedfuture
, or either, at the choice of the implementation.- N2997: Issues on Futures (Rev. 1)
This is a revision of N2967 from the pre-meeting mailing, which incorporates feedback from the committee. This is the version that was accepted, and which has been incorporated into the working draft. The key change is that
unique_future
has been renamed to justfuture
in order to make things easier to read in what is anticipated to be the common case.- N2999: Background for issue 887: Clocks and Condition Variables (Rev. 1)
This is a minor revision of Detlef's paper from the pre-meeting mailing (N2969). The changes it proposes have not yet been accepted into the working paper, and will make it implementation-dependent which clocks can be used for condition variable waits.
Other concurrency changes in the working draft
std::mutex
now has aconstexpr
constructor, so namespace-scope objects are guaranteed to be safe from initialization order issues.- The return type of the
wait_for
andwait_until
member functions ofstd::condition_variable
andstd::condition_variable_any
has been changed from abool
to acv_status
enum. This makes it clear whether the return is due to a signal (or spurious wake) or because the operation timed out.
Updated implementation
Our just::thread
implementation of the C++0x thread library will shortly be updated
to incorporate all these changes (including an implementation
of std::async
). Existing customers will get a free
upgrade as usual.
Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: C++0x, C++, standards, concurrency
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
More Chances to Win a copy of just::thread in our Halloween Contest
Thursday, 29 October 2009
It's been good to see the entries coming in for our Halloween Contest, and with only two days to go we've decided to add more chances of winning. A blog post with your concurrency story now counts as TWO entries. You can get a further entry by commenting on this blog entry, and yet another by tweeting about this contest. If you do everything, that gives you four chances to win!
Not only does just::thread
provide a complete
implementation of
the C++0x standard
thread library for Windows and Linux, but it also
includes
a special
deadlock detection mode. If deadlocks are sucking the
life out of your multithreaded code then the deadlock detection
mode can get straight to the heart of the problem and tell you
which synchronization objects are the cause of the problem, which
threads are waiting and where. Crucially, the library will also
tell you which thread owns each object, and where it took
ownership.
Good luck!
Posted by Anthony Williams
[/ news /] permanent link
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Win a copy of just::thread in our Halloween Contest
Friday, 16 October 2009
Yes, that's right — this Halloween we've got 3 copies of
the just::thread
portability pack worth £100 to give away. The
Portability Pack provides implementations
of just::thread
for both Linux and Windows, so you can use the same code for both
platforms right off the bat.
Not only does just::thread
provide a complete
implementation of
the C++0x standard
thread library for Windows and Linux, but it also
includes
a special
deadlock detection mode. If deadlocks are sucking the
life out of your multithreaded code then the deadlock detection
mode can get straight to the heart of the problem and tell you
which synchronization objects are the cause of the problem, which
threads are waiting and where. Crucially, the library will also
tell you which thread owns each object, and where it took
ownership.
Enter this contest now for your chance to win.
How to enter
Entering is really easy — just write a short blog post about
your concurrency gremlins that links back to the contest page. It
could be that you'd like to add multithreading to your application
but don't know where to start, or you're plagued with deadlocks (in
which case you could use your new copy
of just::thread
to help you find the cause!) Maybe you've conquered all your
concurrency problems, in which case I'd like to hear about how you
did it. Whatever your story, I want to know —
even a twitter-style 140-character summary would be fine.
Once you've got your blog post up, either email contest@stdthread.co.uk, or add a comment to this blog entry. Just make sure you do it before the deadline: Halloween, 31st October 2009. For full details and terms and conditions, see the contest page.
Good luck!
Posted by Anthony Williams
[/ news /] permanent link
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
September 2009 C++ Standards Committee Mailing
Wednesday, 30 September 2009
The September 2009 mailing for the C++ Standards Committee was published today. This is the pre-meeting mailing for the October committee meeting and contains the first new working draft since the "Concepts" feature was voted out of C++0x in July.
There is also quite a large number of papers in this mailing, including 7 concurrency-related papers (of which my name is on 2), which I summarize below:
Concurrency-related papers
- N2955: Comments on the C++ Memory Model Following a Partial Formalization Attempt
Mark has been working on a formal model for parts of the C++ memory model, and has identified a few cases where the wording could be improved to clarify things.
- N2959: Managing the lifetime of thread_local variables with contexts (Revision 1)
This is a revision of my earlier paper, N2907 which expands my proposal for a
thread_local_context
class, along with full proposed wording. I think this provides a workable solution to the general problem of ensuring that destructors forthread_local
variables are run at particular points in the execution of a program as discussed in N2880.- N2967: Issues on Futures
This paper that I co-authored with Detlef Vollmann provides a complete rewording for the "Futures" section of the standard (30.6). It folds the proposed changes from N2888 in with changes requested by the LWG at July's meeting (including the addition of a new future type —
atomic_future
which serializes all operations), and editorial comments on the earlier wording. It doesn't resolve everything — there is still some discussion over whetherunique_future::get()
should return a value or a reference, for example — but it provides a sound basis for further discussion.- N2969: Background for issue 887: Clocks and Condition Variables
Detlef argues the case that
std::condition_variable::wait_for
andstd::condition_variable::wait_until
may wake "too late" if the user adjusts the system clock during a wait, and suggests simplifying the overload set to match the minimum requirements of POSIX. LWG has already decided that this is Not A Defect (NAD), and I agree — there are no guarantees about how soon after the specified timeout the wait will return, and this is thus entirely a quality of implementation issue.- N2970: A simple async() (revision 1)
This is a revision to Herb Sutter's take on a
std::async
function. There is a lot of discussion around the various issues, but the proposed wording seems incomplete — for example, if theunique_future
associated with asynchronous
async
call is moved into ashared_future
which is then copied, which call toget()
runs the task?- N2973: An Asynchronous Call for C++
This is a revision to Lawrence Crowl's take on a
std::async
function. Whilst better specified than N2970 (a "serial" task is invoked inunique_future::get()
or when theunique_future
is moved into ashared_future
), I'm not entirely happy with the result.- N2974: An Analysis of Async and Futures
To aid the committee with deciding on the semantics of
std::async
, Lawrence has put together this paper which outlines various options for creating futures from anasync
function. He then goes on to outline the operations that we may wish to perform on those futures, and how they relate to the various options forasync
.
As you can see by the 3 papers on the issue,
std::async
is a hot topic. I hope that a single coherent
design can be agreed on at Santa Cruz, and incorporated into the
standard. Comment
on this blog post if you've got an opinion on the matter, and I'll
pass it on.
Other papers
As I said at the beginning of this post, there are quite a few papers in this mailing. Other than the new working draft, there were 3 papers that piqued my interest:
- N2954: Unified Function Syntax which proposes allowing lambda-style syntax for normal functions,
- N2953: Defining Move Special Member Functions which covers the implicit generation of move constructors and move assignment operators, and
- N2951:
forward which covers the usage and constraints on
std::forward
Other papers include changes to the allocator proposals, papers to cover specific core issues, a couple of papers on type traits and a couple on pairs and tuples. See the full paper list for further details.
Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: C++0x, C++, standards, concurrency
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
"The Perils of Data Races" Article Online
Friday, 11 September 2009
My latest article, Avoiding the Perils of C++0x Data Races has been published at devx.com.
Race conditions are one of the biggest causes of problems in multithreaded code. This article takes a brief look at some of the ways in which race conditions can occur, with sample code that demonstrates the perils of data races.
A Data Race is a specific type of race condition, and occurs where multiple threads access the same non-atomic variable without synchronization, and at least one of those threads performs a write. The sample code in the article demonstrates how a data race can cause corrupt data, which can then cause additional problems in other code.
Posted by Anthony Williams
[/ news /] permanent link
Tags: article, multithreading, C++0x, data races, race conditions
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Multithreading in C++0x part 7: Locking multiple mutexes without deadlock
Friday, 21 August 2009
This is the seventh in a series of blog posts introducing the new C++0x thread library. So far we've looked at the various ways of starting threads in C++0x and protecting shared data with mutexes. See the end of this article for a full set of links to the rest of the series.
After last time's detour
into lazy
initialization, our regular programming resumes with the promised
article on std::lock()
.
Acquiring locks on multiple mutexes
In most cases, your code should only ever hold a lock on one mutex at a time. Occasionally, you might nest your locks, e.g. by calling into a subsystem that protects its internal data with a mutex whilst holding a lock on another mutex, but it's generally better to avoid holding locks on multiple mutexes at once if at all possible.
However, sometimes it is necessary to hold a lock on more than one mutex, because you need to perform an operation on two distinct items of data, each of which is protected by its own mutex. In order to perform the operation correctly, you need to hold locks on both the mutexes, otherwise another thread could change one of the values before your operation was complete. For example, consider a bank transfer between two accounts: we want the transfer to be a single action to the outside world, so we need to acquire the lock on the mutex protecting each account. You could naively implement this like so:
class account { std::mutex m; currency_value balance; public: friend void transfer(account& from,account& to, currency_value amount) { std::lock_guard<std::mutex> lock_from(from.m); std::lock_guard<std::mutex> lock_to(to.m); from.balance -= amount; to.balance += amount; } };
Though it looks right at first glance (both locks are acquired
before data is accessed), there is a potential for deadlock in this
code. Consider two threads calling transfer()
at the
same time on the same accounts, but one thread is transferring money
from account A to account B, whereas the other is transferring money
from account B to account A. If the calls to transfer()
are running concurrently, then both threads will try and lock the
mutex on their from
account. Assuming there's nothing
else happening in the system, this will succeed, as for thread 1
the from
account is account A, whereas for thread 2
the from
account is account B. Now each thread tries to
acquire the lock on the mutex for the
corresponding to
. Thread 1 will try and lock the mutex
for account B (thread 1 is transferring from A to B). Unfortunately,
it will block because thread 2 holds that lock. Meanwhile thread B
will try and lock the mutex for account A (thread 2 is transferring
from B to A). Thread 1 holds this mutex, so thread 2 must also block
— deadlock.
Avoiding deadlock
with std::lock()
In order to avoid this problem, you need to somehow tell the system
to lock the two mutexes together, so that one of the threads
acquires both locks and deadlock is avoided. This is what
the std::lock()
function is for — you supply a number of mutexes (it's a
variadic template) and the thread library ensures that when the
function returns they are all locked. Thus we can avoid the race
condition in transfer()
by writing it as follows:
void transfer(account& from,account& to, currency_value amount) { std::lock(from.m,to.m); std::lock_guard<std::mutex> lock_from(from.m,std::adopt_lock); std::lock_guard<std::mutex> lock_to(to.m,std::adopt_lock); from.balance -= amount; to.balance += amount; }
Here we
use std::lock()
to lock both mutexes safely, then adopt the ownership into
the std::lock_guard
instances to ensure the locks are released safely at the end of the
function.
Other mutex types
As mentioned
already, std::lock()
is a function template rather than a plain function. Not only does
this mean it can merrily accept any number of mutex arguments, but
it also means that it can accept any type of mutex
arguments. The arguments don't even all have to be the same
type. You can pass anything which
implements lock()
, try_lock()
and unlock()
member functions with appropriate
semantics. As you may remember
from part
5 of this series, std::unique_lock<>
provides these member functions, so you can pass an instance of std::unique_lock<>
to std::lock()
. Indeed,
you could also write transfer()
using std::unique_lock<>
like this:
void transfer(account& from,account& to, currency_value amount) { std::unique_lock<std::mutex> lock_from(from.m,std::defer_lock); std::unique_lock<std::mutex> lock_to(to.m,std::defer_lock); std::lock(lock_from,lock_to); from.balance -= amount; to.balance += amount; }
In this case, we construct
the std::unique_lock<>
instances without actually locking the mutexes, and then lock them
both together
with std::lock()
afterwards. You still get all the benefits of the deadlock avoidance,
and the same level of exception safety — which approach to use
is up to you, and depends on what else is happening in the code.
Exception safety
Since std::lock()
has to work with any mutex type you might throw at it, including
user-defined ones, it has to be able to cope with exceptions. In
particular, it has to provide sensible behaviour if a call
to lock()
or try_lock()
on one of the
supplied mutexes throws an exception. The way it does this is quite
simple: if a call to lock()
or try_lock()
on one of the supplied mutexes throws an exception
then unlock()
is called on each of the mutexes for
which this call
to std::lock()
currently owns a lock. So, if you are locking 4 mutexes and the call
has successfully acquired 2 of them, but a call
to try_lock()
on the third throws an exception then the
locks on the first two will be released by calls
to unlock()
.
The upshot of this is that
if std::lock()
completes successfully then the calling thread owns the lock on all
the supplied mutexes, but if the call exits with an exception then
from the point of view of lock ownership it will be as-if the call
was never made — any additional locks acquired have been
released again.
No silver bullet
There are many ways to write code that
deadlocks: std::lock()
only addresses the particular case of acquiring multiple mutexes
together. However, if you need to do this
then std::lock()
ensures that you don't need to worry about lock ordering issues.
If your code acquires locks on multiple mutexes,
but std::lock()
isn't applicable for your case, then you need to take care of lock
ordering issues another way. One possibility is to enforce an
ordering
by using
a hierarchical mutex. If you've got a deadlock in your code,
then things like
the deadlock
detection mode of my just::thread
library can
help you pinpoint the cause.
Next time
In the next installment we'll take a look at the "futures" mechanism from C++0x. Futures are a high level mechanism for passing a value between threads, and allow a thread to wait for a result to be available without having to manage the locks directly.
Subscribe to the RSS feed or email newsletter for this blog to be sure you don't miss the rest of the series.
Try it out
If you're using Microsoft Visual Studio 2008 or g++ 4.3 or 4.4 on
Ubuntu Linux you can try out the examples from this series using our
just::thread
implementation of the new C++0x thread library. Get your copy
today.
Multithreading in C++0x Series
Here are the posts in this series so far:
- Multithreading in C++0x Part 1: Starting Threads
- Multithreading in C++0x Part 2: Starting Threads with Function Objects and Arguments
- Multithreading in C++0x Part 3: Starting Threads with Member Functions and Reference Arguments
- Multithreading in C++0x Part 4: Protecting Shared Data
- Multithreading
in C++0x Part 5: Flexible locking
with
std::unique_lock<>
- Multithreading in C++0x part 6: Lazy initialization and double-checked locking with atomics
- Multithreading in C++0x part 7: Locking multiple mutexes without deadlock
- Multithreading in C++0x part 8: Futures, Promises and Asynchronous Function Calls
Posted by Anthony Williams
[/ threading /] permanent link
Tags: concurrency, multithreading, C++0x, thread, atomics
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Multithreading in C++0x part 6: Lazy initialization and double-checked locking with atomics
Thursday, 13 August 2009
This is the sixth in a series of blog posts introducing the new C++0x thread library. So far we've looked at the various ways of starting threads in C++0x and protecting shared data with mutexes. See the end of this article for a full set of links to the rest of the series.
I had intended to write about the use of the new std::lock()
for avoiding deadlock in this article. However, there was a post on
comp.std.c++ this morning about lazy initialization, and I thought I'd
write about that instead. std::lock()
can wait until next time.
Lazy Initialization
The classic lazy-initialization case is where you have an object that is expensive to construct but isn't always needed. In this case you can choose to only initialize it on first use:
class lazy_init { mutable std::unique_ptr<expensive_data> data; public: expensive_data const& get_data() const { if(!data) { data.reset(new expensive_data); } return *data; } };
However, we can't use this idiom in multi-threaded code, since
there would be a data race on the accesses to data
. Enter std::call_once()
— by using an instance of std::once_flag
to protect the initialization we can make the data race go away:
class lazy_init { mutable std::once_flag flag; mutable std::unique_ptr<expensive_data> data; void do_init() const { data.reset(new expensive_data); } public: expensive_data const& get_data() const { std::call_once(flag,&lazy_init::do_init,this); return *data; } };
Concurrent calls to get_data()
are now safe: if the
data has already been initialized they can just proceed
concurrently. If not, then all threads calling concurrently except one
will wait until the remaining thread has completed the
initialization.
Reinitialization
This is all very well if you only want to initialize the
data once. However, what if you need to update the data
— perhaps it's a cache of some rarely-changing data
that's expensive to come
by. std::call_once()
doesn't support multiple calls (hence the name). You could of course
protect the data with a mutex, as shown below:
class lazy_init_with_cache { mutable std::mutex m; mutable std::shared_ptr<const expensive_data> data; public: std::shared_ptr<const expensive_data> get_data() const { std::lock_guard<std::mutex> lk(m); if(!data) { data.reset(new expensive_data); } return data; } void invalidate_cache() { std::lock_guard<std::mutex> lk(m); data.reset(); } };
Note that in this case we return a std::shared_ptr<const
expensive_data>
rather than a reference to avoid a race
condition on the data itself — this ensures that the copy held
by the calling code will be valid (if out of date) even if another
thread calls invalidate_cache()
before the data can be
used.
This "works" in the sense that it avoids data races, but if the
updates are rare and the reads are frequent then this may cause
unnecessary serialization when multiple threads
call get_data()
concurrently. What other options do we
have?
Double-checked locking returns
Much has been written about how double-checked locking is broken when using multiple threads. However, the chief cause of the problem is that the sample code uses plain non-atomic operations to check the flag outside the mutex, so is subject to a data race. You can overcome this by careful use of the C++0x atomics, as shown in the example below:
class lazy_init_with_cache { mutable std::mutex m; mutable std::shared_ptr<const expensive_data> data; public: std::shared_ptr<const expensive_data> get_data() const { std::shared_ptr<const expensive_data> result= std::atomic_load_explicit(&data,std::memory_order_acquire); if(!result) { std::lock_guard<std::mutex> lk(m); result=data; if(!result) { result.reset(new expensive_data); std::atomic_store_explicit(&data,result,std::memory_order_release); } } return result; } void invalidate_cache() { std::lock_guard<std::mutex> lk(m); std::shared_ptr<const expensive_data> dummy; std::atomic_store_explicit(&data,dummy,std::memory_order_relaxed); } };
Note that in this case, all writes to data
use
atomic operations, even those within the mutex lock. This is
necessary in order to ensure that the atomic load operation at the
start of get_data()
actually has a coherent value to read
— there's no point doing an atomic load if the stores are not
atomic, otherwise you might atomically load some half-written
data. Also, the atomic load and store operations ensure that the
reference count on the std::shared_ptr
object is
correctly updated, so that the expensive_data
object is
correctly destroyed when the last std::shared_ptr
object
referencing it is destroyed.
If our atomic load actually returned a non-NULL
value
then we can use that, just as we did before. However, if it
returned NULL
then we need to lock the mutex and try
again. This time we can use a plain read of data
, since
the mutex is locked. If we still get NULL
then we need to
do the initialization. However, we can't just
call data.reset()
like before, since that is not
atomic. Instead we must create a local std::shared_ptr
instance with the value we want, and store the value with an atomic
store operation. We can use result
for the local value,
since we want the value in that variable anyway.
In invalidate_cache()
we must also store the value
using std::atomic_store_explicit()
, in order to ensure
that the NULL
value is correctly read
in get_data()
. Note also that we must also lock the mutex
here, in order to avoid a data race with the initialization code
inside the mutex lock in get_data()
.
Memory ordering
By using std::atomic_load_explicit()
and std::atomic_store_explicit()
we can specify the
memory
ordering requirements of the operations. We could have just
used std::atomic_load()
and std::atomic_store()
, but those would have
implied std::memory_order_seq_cst
, which is overkill in
this scenario. What we need is to ensure that if a
non-NULL
value is read in get_data()
then
the actual creation of the associated object happens-before
the read. The store in get_data()
must therefore
use std::memory_order_release
, whilst the load
uses std::memory_order_acquire
.
On the other hand, the store in invalidate_cache()
can
merrily use std::memory_order_relaxed
, since there is
no data associated with the store: if the load
in get_data()
reads NULL
then the mutex
will be locked, which will handle any necessary synchronization.
Whenever you use atomic operations, you have to make sure that the memory ordering is correct, and that there are no races. Even in such a simple case such as this it is not trivial, and I would not recommend it unless profiling has shown that this is really a problem.
Update: As if to highlight my previous point about the
trickiness of atomic operations, Dmitriy correctly points out in the
comments that the use of std::shared_ptr
to access
the expensive_data
implies a reference count, which is a
real performance suck in multithreaded code. Whatever the memory
ordering constraints we put on it, every thread doing the reading has
to update the reference count. This is thus a source of contention,
and can seriously limit scalability even if it doesn't force full
serialization. The same issues apply to multiple-reader single-writer
mutexes — Joe Duffy has written about them
over
on his blog. Time it on your platform with just a mutex (i.e. no
double-checked locking), and with the atomic operations, and use
whichever is faster. Alternatively, use a memory reclamation scheme
specially tailored for your usage.
Next time
In the next part of this series I'll cover the use
of std::lock()
that I was intending to cover in this installment.
Subscribe to the RSS feed or email newsletter for this blog to be sure you don't miss the rest of the series.
Try it out
If you're using Microsoft Visual Studio 2008 or g++ 4.3 or 4.4 on
Ubuntu Linux you can try out the examples from this series using our
just::thread
implementation of the new C++0x thread library. Get your copy
today.
Note: since std::shared_ptr
is part of the library
supplied with the
compiler, just::thread
cannot provide the atomic functions for std::shared_ptr
used in this article.
Multithreading in C++0x Series
Here are the posts in this series so far:
- Multithreading in C++0x Part 1: Starting Threads
- Multithreading in C++0x Part 2: Starting Threads with Function Objects and Arguments
- Multithreading in C++0x Part 3: Starting Threads with Member Functions and Reference Arguments
- Multithreading in C++0x Part 4: Protecting Shared Data
- Multithreading
in C++0x Part 5: Flexible locking
with
std::unique_lock<>
- Multithreading in C++0x part 6: Lazy initialization and double-checked locking with atomics
- Multithreading in C++0x part 7: Locking multiple mutexes without deadlock
- Multithreading in C++0x part 8: Futures, Promises and Asynchronous Function Calls
Posted by Anthony Williams
[/ threading /] permanent link
Tags: concurrency, multithreading, C++0x, thread, atomics
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Design and Content Copyright © 2005-2025 Just Software Solutions Ltd. All rights reserved. | Privacy Policy