Blog Archive

Multithreading in C++0x part 8: Futures, Promises and Asynchronous Function Calls

Thursday, 11 February 2010

This is the eighth in a series of blog posts introducing the new C++0x thread library. See the end of this article for a full set of links to the rest of the series.

In this installment we'll take a look at the "futures" mechanism from C++0x. Futures are a high level mechanism for passing a value between threads, and allow a thread to wait for a result to be available without having to manage the locks directly.

Futures and asynchronous function calls

The most basic use of a future is to hold the result of a call to the new std::async function for running some code asynchronously:

#include <future>
#include <iostream>

int calculate_the_answer_to_LtUaE();
void do_stuff();

int main()
{
    std::future<int> the_answer=std::async(calculate_the_answer_to_LtUaE);
    do_stuff();
    std::cout<<"The answer to life, the universe and everything is "
             <<the_answer.get()<<std::endl;
}

The call to std::async takes care of creating a thread, and invoking calculate_the_answer_to_LtUaE on that thread. The main thread can then get on with calling do_stuff() whilst the immensely time consuming process of calculating the ultimate answer is done in the background. Finally, the call to the get() member function of the std::future<int> object then waits for the function to complete and ensures that the necessary synchronization is applied to transfer the value over so the main thread can print "42".

Sometimes asynchronous functions aren't really asynchronous

Though I said that std::async takes care of creating a thread, that's not necessarily true. As well as the function being called, std::async takes a launch policy which specifies whether to start a new thread or create a "deferred function" which is only run when you wait for it. The default launch policy for std::async is std::launch::any, which means that the implementation gets to choose for you. If you really want to ensure that your function is run on its own thread then you need to specify the std::launch::async policy:

  std::future<int> the_answer=std::async(std::launch::async,calculate_the_answer_to_LtUaE);

Likewise, if you really want the function to be executed in the get() call then you can specify the std::launch::sync policy:

  std::future<int> the_answer=std::async(std::launch::sync,calculate_the_answer_to_LtUaE);

In most cases it makes sense to let the library choose. That way you'll avoid creating too many threads and overloading the machine, whilst taking advantage of the available hardware threads. If you need fine control, you're probably better off managing your own threads.

Divide and Conquer

std::async can be used to easily parallelize simple algorithms. For example, you can write a parallel version of for_each as follows:

template<typename Iterator,typename Func>
void parallel_for_each(Iterator first,Iterator last,Func f)
{
    ptrdiff_t const range_length=last-first;
    if(!range_length)
        return;
    if(range_length==1)
    {
        f(*first);
        return;
    }

    Iterator const mid=first+(range_length/2);

    std::future<void> bgtask=std::async(&parallel_for_each<Iterator,Func>,
                                        first,mid,f);
    try
    {
        parallel_for_each(mid,last,f);
    }
    catch(...)
    {
        bgtask.wait();
        throw;
    }
    bgtask.get();   
}

This simple bit of code recursively divides up the range into smaller and smaller pieces. Obviously an empty range doesn't require anything to happen, and a single-point range just requires calling f on the one and only value. For bigger ranges then an asynchronous task is spawned to handle the first half, and then the second half is handled by a recursive call.

The try - catch block just ensures that the asynchronous task is finished before we leave the function even if an exception in order to avoid the background tasks potentially accessing the range after it has been destroyed. Finally, the get() call waits for the background task, and propagates any exception thrown from the background task. That way if an exception is thrown during any of the processing then the calling code will see an exception. Of course if more than one exception is thrown then some will get swallowed, but C++ can only handle one exception at a time, so that's the best that can be done without using a custom composite_exception class to collect them all.

Many algorithms can be readily parallelized this way, though you may want to have more than one element as the minimum range in order to avoid the overhead of spawning the asynchronous tasks.

Promises

An alternative to using std::async to spawn the task and return the future is to manage the threads yourself and use the std::promise class template to provide the future. Promises provide a basic mechanism for transferring values between threads: each std::promise object is associated with a single std::future object. A thread with access to the std::future object can use wait for the result to be set, whilst another thread that has access to the corresponding std::promise object can call set_value() to store the value and make the future ready. This works well if the thread has more than one task to do, as information can be made ready to other threads as it becomes available rather than all of them having to wait until the thread doing the work has completed. It also allows for situations where multiple threads could produce the answer: from the point of view of the waiting thread it doesn't matter where the answer came from, just that it is there so it makes sense to have a single future to represent that availability.

For example, asynchronous I/O could be modelled on a promise/future basis: when you submit an I/O request then the async I/O handler creates a promise/future pair. The future is returned to the caller, which can then wait on the future when it needs the data, and the promise is stored alongside the details of the request. When the request has been fulfilled then the I/O thread can set the value on the promise to pass the value back to the waiting thread before moving on to process additional requests. The following code shows a sample implementation of this pattern.

class aio
{
    class io_request
    {
        std::streambuf* is;
        unsigned read_count;
        std::promise<std::vector<char> > p;
    public:
        explicit io_request(std::streambuf& is_,unsigned count_):
            is(&is_),read_count(count_)
        {}
    
        io_request(io_request&& other):
            is(other.is),
            read_count(other.read_count),
            p(std::move(other.p))
        {}

        io_request():
            is(0),read_count(0)
        {}

        std::future<std::vector<char> > get_future()
        {
            return p.get_future();
        }

        void process()
        {
            try
            {
                std::vector<char> buffer(read_count);

                unsigned amount_read=0;
                while((amount_read != read_count) && 
                      (is->sgetc()!=std::char_traits<char>::eof()))
                {
                    amount_read+=is->sgetn(&buffer[amount_read],read_count-amount_read);
                }

                buffer.resize(amount_read);
                
                p.set_value(std::move(buffer));
            }
            catch(...)
            {
                p.set_exception(std::current_exception());
            }
        }
    };

    thread_safe_queue<io_request> request_queue;
    std::atomic_bool done;

    void io_thread()
    {
        while(!done)
        {
            io_request req=request_queue.pop();
            req.process();
        }
    }

    std::thread iot;
    
public:
    aio():
        done(false),
        iot(&aio::io_thread,this)
    {}

    std::future<std::vector<char> > queue_read(std::streambuf& is,unsigned count)
    {
        io_request req(is,count);
        std::future<std::vector<char> > f(req.get_future());
        request_queue.push(std::move(req));
        return f;
    }
    
    ~aio()
    {
        done=true;
        request_queue.push(io_request());
        iot.join();
    }
};

void do_stuff()
{}

void process_data(std::vector<char> v)
{
    for(unsigned i=0;i<v.size();++i)
    {
        std::cout<<v[i];
    }
    std::cout<<std::endl;
} 

int main()
{
    aio async_io;

    std::filebuf f;
    f.open("my_file.dat",std::ios::in | std::ios::binary);

    std::future<std::vector<char> > fv=async_io.queue_read(f,1048576);
    
    do_stuff();
    process_data(fv.get());
    
    return 0;
}

Next Time

The sample code above also demonstrates passing exceptions between threads using the set_exception() member function of std::promise. I'll go into more detail about exceptions in multithreaded next time.

Subscribe to the RSS feed or email newsletter for this blog to be sure you don't miss the rest of the series.

Try it out

If you're using Microsoft Visual Studio 2008 or g++ 4.3 or 4.4 on Ubuntu Linux you can try out the examples from this series using our just::thread implementation of the new C++0x thread library. Get your copy today.

Multithreading in C++0x Series

Here are the posts in this series so far:

Posted by Anthony Williams
[/ threading /] permanent link
Tags: concurrency, multithreading, C++0x, thread, future, promise, async
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

just::thread C++0x Thread Library V1.3 Released

Wednesday, 13 January 2010

I am pleased to announce that version 1.3 of just::thread, our C++0x Thread Library has just been released.

This release is the first to feature support for the new std::async function for starting asynchronous tasks. This provides a higher-level interface for managing threads than is available with std::thread, and allows your code to easily take advantage of the available hardware concurrency without excessive oversubscription.

This is also the first release to support 64-bit Windows.

The linux port is available for 32-bit and 64-bit Ubuntu linux, and takes full advantage of the C++0x support available from g++ 4.3 and g++ 4.4. The Windows port is available for Microsoft Visual Studio 2008 for both 32-bit and 64-bit Windows. Purchase your copy and get started NOW.

As usual, existing customers are entitled to a free upgrade to V1.3 from all earlier versions.

Posted by Anthony Williams
[/ news /] permanent link
Tags: multithreading, concurrency, C++0x
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Happy New Year 2010

Tuesday, 05 January 2010

It's already five days into 2010, but I'd like to wish you all a Happy New Year!

2009 was a good year for me. Back in January 2009, my implementation of the C++0x thread library went on sale, and sales have been growing steadily since — there's a new version due out any day now, with support for the new std::async functions and 64-bit Windows. I also presented at the ACCU conference for the second year running and completed the first draft of my book.

It's also been a big year for the C++ community. The biggest change is of course that "Concepts" were taken out of the C++0x draft since they were not ready. On the concurrency front, the proposal for the new std::async functions was accepted, std::unique_future was renamed to just std::future and the destructor of std::thread was changed to call std::terminate rather than detach if the thread has not been joined or detached.

What's coming in 2010?

Will 2010 be even better than 2009? I hope so. There's a new version of just::thread coming soon, and there's another ballot on the C++0x working draft due in the spring. I'll also be presenting at ACCU 2010 in April.

What are you looking forward to in 2010?

Posted by Anthony Williams
[/ news /] permanent link
Tags: popular, articles
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

November 2009 C++ Standards Committee Mailing

Tuesday, 17 November 2009

The November 2009 mailing for the C++ Standards Committee was published last week. This is the post-meeting mailing for the October committee meeting and contains a new working draft incorporating all the proposals accepted at the October meeting.

As those of you who read Herb Sutter's blog or Michael Wong's blog will already know, the big news for concurrency is that a proposal for std::async has been accepted, and this is reflected in the latest working draft.

There are 5 concurrency-related papers (of which my name is on one), which I summarize below:

Concurrency-related papers

N2985: C and C++ Thread Compatibility

Lawrence Crowl has been through the concurrency and multithreading support from the C++ working draft, and compared this to that now being added to the C working draft for the next C standard. This paper contains his observations and recommendations about the potential for compatibility between the types and functions provided by the two draft standards. I agree with some of his recommendations, and disagree with others. No action will be taken by the C++ committee unless his recommendations become full proposals.

N2992: More Collected Issues with Atomics

This is Lawrence Crowl's paper which rounds up various issues with the C++ atomics. It was accepted by the committee, and has been incorporated into the working draft. The most noticeable change is that the atomic types and functions now live in the <atomic> header. The macro for identifying lock-free atomics has been expanded into a set of macros, one per type, and a guarantee has been added that all instances of a type are lock-free, or all are not lock-free (rather than it being allowed to vary between objects). There is also a clarification that if you use compare-exchange operations on a type with padding bits then the padding bits will be part of the comparison (which will therefore affect atomic<float> or similar), plus a few other clarifications.

N2996: A Simple Asynchronous Call

This paper is by Herb Sutter and Lawrence Crowl, and brings together their papers from the pre-meeting mailing (N2970 and N2973) based on feedback from the committee. This is the std::async proposal that got accepted, and which has been incorporated into the working draft.

The result is that we have a variadic async function with an optional launch policy as the first argument, which specifies whether the function is to be spawned on a separate thread, deferred until get() or wait() is called on the returned future, or either, at the choice of the implementation.

N2997: Issues on Futures (Rev. 1)

This is a revision of N2967 from the pre-meeting mailing, which incorporates feedback from the committee. This is the version that was accepted, and which has been incorporated into the working draft. The key change is that unique_future has been renamed to just future in order to make things easier to read in what is anticipated to be the common case.

N2999: Background for issue 887: Clocks and Condition Variables (Rev. 1)

This is a minor revision of Detlef's paper from the pre-meeting mailing (N2969). The changes it proposes have not yet been accepted into the working paper, and will make it implementation-dependent which clocks can be used for condition variable waits.

Other concurrency changes in the working draft

std::mutex now has a constexpr constructor, so namespace-scope objects are guaranteed to be safe from initialization order issues.
The return type of the wait_for and wait_until member functions of std::condition_variable and std::condition_variable_any has been changed from a bool to a cv_status enum. This makes it clear whether the return is due to a signal (or spurious wake) or because the operation timed out.

Updated implementation

Our just::thread implementation of the C++0x thread library will shortly be updated to incorporate all these changes (including an implementation of std::async). Existing customers will get a free upgrade as usual.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: C++0x, C++, standards, concurrency
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

More Chances to Win a copy of just::thread in our Halloween Contest

Thursday, 29 October 2009

It's been good to see the entries coming in for our Halloween Contest, and with only two days to go we've decided to add more chances of winning. A blog post with your concurrency story now counts as TWO entries. You can get a further entry by commenting on this blog entry, and yet another by tweeting about this contest. If you do everything, that gives you four chances to win!

Not only does just::thread provide a complete implementation of the C++0x standard thread library for Windows and Linux, but it also includes a special deadlock detection mode. If deadlocks are sucking the life out of your multithreaded code then the deadlock detection mode can get straight to the heart of the problem and tell you which synchronization objects are the cause of the problem, which threads are waiting and where. Crucially, the library will also tell you which thread owns each object, and where it took ownership.

Good luck!

Posted by Anthony Williams
[/ news /] permanent link
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Win a copy of just::thread in our Halloween Contest

Friday, 16 October 2009

Yes, that's right — this Halloween we've got 3 copies of the just::thread portability pack worth £100 to give away. The Portability Pack provides implementations of just::thread for both Linux and Windows, so you can use the same code for both platforms right off the bat.

Enter this contest now for your chance to win.

How to enter

Entering is really easy — just write a short blog post about your concurrency gremlins that links back to the contest page. It could be that you'd like to add multithreading to your application but don't know where to start, or you're plagued with deadlocks (in which case you could use your new copy of just::thread to help you find the cause!) Maybe you've conquered all your concurrency problems, in which case I'd like to hear about how you did it. Whatever your story, I want to know — even a twitter-style 140-character summary would be fine.

Once you've got your blog post up, either email contest@stdthread.co.uk, or add a comment to this blog entry. Just make sure you do it before the deadline: Halloween, 31st October 2009. For full details and terms and conditions, see the contest page.

Good luck!

Posted by Anthony Williams
[/ news /] permanent link
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

September 2009 C++ Standards Committee Mailing

Wednesday, 30 September 2009

The September 2009 mailing for the C++ Standards Committee was published today. This is the pre-meeting mailing for the October committee meeting and contains the first new working draft since the "Concepts" feature was voted out of C++0x in July.

There is also quite a large number of papers in this mailing, including 7 concurrency-related papers (of which my name is on 2), which I summarize below:

Concurrency-related papers

N2955: Comments on the C++ Memory Model Following a Partial Formalization Attempt: Mark has been working on a formal model for parts of the C++ memory model, and has identified a few cases where the wording could be improved to clarify things.
N2959: Managing the lifetime of thread_local variables with contexts (Revision 1): This is a revision of my earlier paper, N2907 which expands my proposal for a thread_local_context class, along with full proposed wording. I think this provides a workable solution to the general problem of ensuring that destructors for thread_local variables are run at particular points in the execution of a program as discussed in N2880.
N2967: Issues on Futures: This paper that I co-authored with Detlef Vollmann provides a complete rewording for the "Futures" section of the standard (30.6). It folds the proposed changes from N2888 in with changes requested by the LWG at July's meeting (including the addition of a new future type — atomic_future which serializes all operations), and editorial comments on the earlier wording. It doesn't resolve everything — there is still some discussion over whether unique_future::get() should return a value or a reference, for example — but it provides a sound basis for further discussion.
N2969: Background for issue 887: Clocks and Condition Variables: Detlef argues the case that std::condition_variable::wait_for and std::condition_variable::wait_until may wake "too late" if the user adjusts the system clock during a wait, and suggests simplifying the overload set to match the minimum requirements of POSIX. LWG has already decided that this is Not A Defect (NAD), and I agree — there are no guarantees about how soon after the specified timeout the wait will return, and this is thus entirely a quality of implementation issue.
N2970: A simple async() (revision 1): This is a revision to Herb Sutter's take on a std::async function. There is a lot of discussion around the various issues, but the proposed wording seems incomplete — for example, if the unique_future associated with a synchronous async call is moved into a shared_future which is then copied, which call to get() runs the task?
N2973: An Asynchronous Call for C++: This is a revision to Lawrence Crowl's take on a std::async function. Whilst better specified than N2970 (a "serial" task is invoked in unique_future::get() or when the unique_future is moved into a shared_future), I'm not entirely happy with the result.
N2974: An Analysis of Async and Futures: To aid the committee with deciding on the semantics of std::async, Lawrence has put together this paper which outlines various options for creating futures from an async function. He then goes on to outline the operations that we may wish to perform on those futures, and how they relate to the various options for async.

As you can see by the 3 papers on the issue, std::async is a hot topic. I hope that a single coherent design can be agreed on at Santa Cruz, and incorporated into the standard. Comment on this blog post if you've got an opinion on the matter, and I'll pass it on.

Other papers

As I said at the beginning of this post, there are quite a few papers in this mailing. Other than the new working draft, there were 3 papers that piqued my interest:

N2954: Unified Function Syntax which proposes allowing lambda-style syntax for normal functions,
N2953: Defining Move Special Member Functions which covers the implicit generation of move constructors and move assignment operators, and
N2951: forward which covers the usage and constraints on std::forward

Other papers include changes to the allocator proposals, papers to cover specific core issues, a couple of papers on type traits and a couple on pairs and tuples. See the full paper list for further details.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: C++0x, C++, standards, concurrency
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

"The Perils of Data Races" Article Online

Friday, 11 September 2009

My latest article, Avoiding the Perils of C++0x Data Races has been published at devx.com.

Race conditions are one of the biggest causes of problems in multithreaded code. This article takes a brief look at some of the ways in which race conditions can occur, with sample code that demonstrates the perils of data races.

A Data Race is a specific type of race condition, and occurs where multiple threads access the same non-atomic variable without synchronization, and at least one of those threads performs a write. The sample code in the article demonstrates how a data race can cause corrupt data, which can then cause additional problems in other code.

Posted by Anthony Williams
[/ news /] permanent link
Tags: article, multithreading, C++0x, data races, race conditions
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Multithreading in C++0x part 7: Locking multiple mutexes without deadlock

Friday, 21 August 2009

This is the seventh in a series of blog posts introducing the new C++0x thread library. So far we've looked at the various ways of starting threads in C++0x and protecting shared data with mutexes. See the end of this article for a full set of links to the rest of the series.

After last time's detour into lazy initialization, our regular programming resumes with the promised article on std::lock().

Acquiring locks on multiple mutexes

In most cases, your code should only ever hold a lock on one mutex at a time. Occasionally, you might nest your locks, e.g. by calling into a subsystem that protects its internal data with a mutex whilst holding a lock on another mutex, but it's generally better to avoid holding locks on multiple mutexes at once if at all possible.

However, sometimes it is necessary to hold a lock on more than one mutex, because you need to perform an operation on two distinct items of data, each of which is protected by its own mutex. In order to perform the operation correctly, you need to hold locks on both the mutexes, otherwise another thread could change one of the values before your operation was complete. For example, consider a bank transfer between two accounts: we want the transfer to be a single action to the outside world, so we need to acquire the lock on the mutex protecting each account. You could naively implement this like so:

class account
{
    std::mutex m;
    currency_value balance;
public:

    friend void transfer(account& from,account& to,
                         currency_value amount)
    {
        std::lock_guard<std::mutex> lock_from(from.m);
        std::lock_guard<std::mutex> lock_to(to.m);
        from.balance -= amount;
        to.balance += amount;
    }
};

Though it looks right at first glance (both locks are acquired before data is accessed), there is a potential for deadlock in this code. Consider two threads calling transfer() at the same time on the same accounts, but one thread is transferring money from account A to account B, whereas the other is transferring money from account B to account A. If the calls to transfer() are running concurrently, then both threads will try and lock the mutex on their from account. Assuming there's nothing else happening in the system, this will succeed, as for thread 1 the from account is account A, whereas for thread 2 the from account is account B. Now each thread tries to acquire the lock on the mutex for the corresponding to. Thread 1 will try and lock the mutex for account B (thread 1 is transferring from A to B). Unfortunately, it will block because thread 2 holds that lock. Meanwhile thread B will try and lock the mutex for account A (thread 2 is transferring from B to A). Thread 1 holds this mutex, so thread 2 must also block — deadlock.

Avoiding deadlock with `std::lock()`

In order to avoid this problem, you need to somehow tell the system to lock the two mutexes together, so that one of the threads acquires both locks and deadlock is avoided. This is what the std::lock() function is for — you supply a number of mutexes (it's a variadic template) and the thread library ensures that when the function returns they are all locked. Thus we can avoid the race condition in transfer() by writing it as follows:

void transfer(account& from,account& to,
              currency_value amount)
{
    std::lock(from.m,to.m);
    std::lock_guard<std::mutex> lock_from(from.m,std::adopt_lock);
    std::lock_guard<std::mutex> lock_to(to.m,std::adopt_lock);
    from.balance -= amount;
    to.balance += amount;
}

Here we use std::lock() to lock both mutexes safely, then adopt the ownership into the std::lock_guard instances to ensure the locks are released safely at the end of the function.

Other mutex types

As mentioned already, std::lock() is a function template rather than a plain function. Not only does this mean it can merrily accept any number of mutex arguments, but it also means that it can accept any type of mutex arguments. The arguments don't even all have to be the same type. You can pass anything which implements lock(), try_lock() and unlock() member functions with appropriate semantics. As you may remember from part 5 of this series, std::unique_lock<> provides these member functions, so you can pass an instance of std::unique_lock<> to std::lock(). Indeed, you could also write transfer() using std::unique_lock<> like this:

void transfer(account& from,account& to,
              currency_value amount)
{
    std::unique_lock<std::mutex> lock_from(from.m,std::defer_lock);
    std::unique_lock<std::mutex> lock_to(to.m,std::defer_lock);
    std::lock(lock_from,lock_to);
    from.balance -= amount;
    to.balance += amount;
}

In this case, we construct the std::unique_lock<> instances without actually locking the mutexes, and then lock them both together with std::lock() afterwards. You still get all the benefits of the deadlock avoidance, and the same level of exception safety — which approach to use is up to you, and depends on what else is happening in the code.

Exception safety

Since std::lock() has to work with any mutex type you might throw at it, including user-defined ones, it has to be able to cope with exceptions. In particular, it has to provide sensible behaviour if a call to lock() or try_lock() on one of the supplied mutexes throws an exception. The way it does this is quite simple: if a call to lock() or try_lock() on one of the supplied mutexes throws an exception then unlock() is called on each of the mutexes for which this call to std::lock() currently owns a lock. So, if you are locking 4 mutexes and the call has successfully acquired 2 of them, but a call to try_lock() on the third throws an exception then the locks on the first two will be released by calls to unlock().

The upshot of this is that if std::lock() completes successfully then the calling thread owns the lock on all the supplied mutexes, but if the call exits with an exception then from the point of view of lock ownership it will be as-if the call was never made — any additional locks acquired have been released again.

No silver bullet

There are many ways to write code that deadlocks: std::lock() only addresses the particular case of acquiring multiple mutexes together. However, if you need to do this then std::lock() ensures that you don't need to worry about lock ordering issues.

If your code acquires locks on multiple mutexes, but std::lock() isn't applicable for your case, then you need to take care of lock ordering issues another way. One possibility is to enforce an ordering by using a hierarchical mutex. If you've got a deadlock in your code, then things like the deadlock detection mode of my just::thread library can help you pinpoint the cause.

Next time

In the next installment we'll take a look at the "futures" mechanism from C++0x. Futures are a high level mechanism for passing a value between threads, and allow a thread to wait for a result to be available without having to manage the locks directly.

Subscribe to the RSS feed or email newsletter for this blog to be sure you don't miss the rest of the series.

Try it out

Multithreading in C++0x Series

Here are the posts in this series so far:

Posted by Anthony Williams
[/ threading /] permanent link
Tags: concurrency, multithreading, C++0x, thread, atomics
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Multithreading in C++0x part 6: Lazy initialization and double-checked locking with atomics

Thursday, 13 August 2009

This is the sixth in a series of blog posts introducing the new C++0x thread library. So far we've looked at the various ways of starting threads in C++0x and protecting shared data with mutexes. See the end of this article for a full set of links to the rest of the series.

I had intended to write about the use of the new std::lock() for avoiding deadlock in this article. However, there was a post on comp.std.c++ this morning about lazy initialization, and I thought I'd write about that instead. std::lock() can wait until next time.

Lazy Initialization

The classic lazy-initialization case is where you have an object that is expensive to construct but isn't always needed. In this case you can choose to only initialize it on first use:

class lazy_init
{
    mutable std::unique_ptr<expensive_data> data;
public:
    expensive_data const& get_data() const
    {
        if(!data)
        {
            data.reset(new expensive_data);
        }
        return *data;
    }
};

However, we can't use this idiom in multi-threaded code, since there would be a data race on the accesses to data. Enter std::call_once() — by using an instance of std::once_flag to protect the initialization we can make the data race go away:

class lazy_init
{
    mutable std::once_flag flag;
    mutable std::unique_ptr<expensive_data> data;

    void do_init() const
    {
        data.reset(new expensive_data);
    }
public:
    expensive_data const& get_data() const
    {
        std::call_once(flag,&lazy_init::do_init,this);
        return *data;
    }
};

Concurrent calls to get_data() are now safe: if the data has already been initialized they can just proceed concurrently. If not, then all threads calling concurrently except one will wait until the remaining thread has completed the initialization.

Reinitialization

This is all very well if you only want to initialize the data once. However, what if you need to update the data — perhaps it's a cache of some rarely-changing data that's expensive to come by. std::call_once() doesn't support multiple calls (hence the name). You could of course protect the data with a mutex, as shown below:

class lazy_init_with_cache
{
    mutable std::mutex m;
    mutable std::shared_ptr<const expensive_data> data;

public:
    std::shared_ptr<const expensive_data> get_data() const
    {
        std::lock_guard<std::mutex> lk(m);
        if(!data)
        {
            data.reset(new expensive_data);
        }
        return data;
    }
    void invalidate_cache()
    {
        std::lock_guard<std::mutex> lk(m);
        data.reset();
    }
};

Note that in this case we return a std::shared_ptr<const expensive_data> rather than a reference to avoid a race condition on the data itself — this ensures that the copy held by the calling code will be valid (if out of date) even if another thread calls invalidate_cache() before the data can be used.

This "works" in the sense that it avoids data races, but if the updates are rare and the reads are frequent then this may cause unnecessary serialization when multiple threads call get_data() concurrently. What other options do we have?

Double-checked locking returns

Much has been written about how double-checked locking is broken when using multiple threads. However, the chief cause of the problem is that the sample code uses plain non-atomic operations to check the flag outside the mutex, so is subject to a data race. You can overcome this by careful use of the C++0x atomics, as shown in the example below:

class lazy_init_with_cache
{
    mutable std::mutex m;
    mutable std::shared_ptr<const expensive_data> data;

public:
    std::shared_ptr<const expensive_data> get_data() const
    {
        std::shared_ptr<const expensive_data> result=
            std::atomic_load_explicit(&data,std::memory_order_acquire);
        if(!result)
        {
            std::lock_guard<std::mutex> lk(m);
            result=data;
            if(!result)
            {
                result.reset(new expensive_data);
                std::atomic_store_explicit(&data,result,std::memory_order_release);
            }
        }
        return result;
    }
    void invalidate_cache()
    {
        std::lock_guard<std::mutex> lk(m);
        std::shared_ptr<const expensive_data> dummy;
        std::atomic_store_explicit(&data,dummy,std::memory_order_relaxed);
    }
};

Note that in this case, all writes to data use atomic operations, even those within the mutex lock. This is necessary in order to ensure that the atomic load operation at the start of get_data() actually has a coherent value to read — there's no point doing an atomic load if the stores are not atomic, otherwise you might atomically load some half-written data. Also, the atomic load and store operations ensure that the reference count on the std::shared_ptr object is correctly updated, so that the expensive_data object is correctly destroyed when the last std::shared_ptr object referencing it is destroyed.

If our atomic load actually returned a non-NULL value then we can use that, just as we did before. However, if it returned NULL then we need to lock the mutex and try again. This time we can use a plain read of data, since the mutex is locked. If we still get NULL then we need to do the initialization. However, we can't just call data.reset() like before, since that is not atomic. Instead we must create a local std::shared_ptr instance with the value we want, and store the value with an atomic store operation. We can use result for the local value, since we want the value in that variable anyway.

In invalidate_cache() we must also store the value using std::atomic_store_explicit(), in order to ensure that the NULL value is correctly read in get_data(). Note also that we must also lock the mutex here, in order to avoid a data race with the initialization code inside the mutex lock in get_data().

Memory ordering

By using std::atomic_load_explicit() and std::atomic_store_explicit() we can specify the memory ordering requirements of the operations. We could have just used std::atomic_load() and std::atomic_store(), but those would have implied std::memory_order_seq_cst, which is overkill in this scenario. What we need is to ensure that if a non-NULL value is read in get_data() then the actual creation of the associated object happens-before the read. The store in get_data() must therefore use std::memory_order_release, whilst the load uses std::memory_order_acquire.

On the other hand, the store in invalidate_cache() can merrily use std::memory_order_relaxed, since there is no data associated with the store: if the load in get_data() reads NULL then the mutex will be locked, which will handle any necessary synchronization.

Whenever you use atomic operations, you have to make sure that the memory ordering is correct, and that there are no races. Even in such a simple case such as this it is not trivial, and I would not recommend it unless profiling has shown that this is really a problem.

Update: As if to highlight my previous point about the trickiness of atomic operations, Dmitriy correctly points out in the comments that the use of std::shared_ptr to access the expensive_data implies a reference count, which is a real performance suck in multithreaded code. Whatever the memory ordering constraints we put on it, every thread doing the reading has to update the reference count. This is thus a source of contention, and can seriously limit scalability even if it doesn't force full serialization. The same issues apply to multiple-reader single-writer mutexes — Joe Duffy has written about them over on his blog. Time it on your platform with just a mutex (i.e. no double-checked locking), and with the atomic operations, and use whichever is faster. Alternatively, use a memory reclamation scheme specially tailored for your usage.

Next time

In the next part of this series I'll cover the use of std::lock() that I was intending to cover in this installment.

Subscribe to the RSS feed or email newsletter for this blog to be sure you don't miss the rest of the series.

Try it out

Note: since std::shared_ptr is part of the library supplied with the compiler, just::thread cannot provide the atomic functions for std::shared_ptr used in this article.

Multithreading in C++0x Series

Here are the posts in this series so far:

Posted by Anthony Williams
[/ threading /] permanent link
Tags: concurrency, multithreading, C++0x, thread, atomics
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

More recent entries Older entries

About Us

Technical Writings

Subscribe to Blog

Blog Archives

Blog Archive

Thursday, 11 February 2010

Futures and asynchronous function calls

Sometimes asynchronous functions aren't really asynchronous

Divide and Conquer

Promises

Next Time

Try it out

Wednesday, 13 January 2010

Tuesday, 05 January 2010

Popular articles

What's coming in 2010?

Tuesday, 17 November 2009

Concurrency-related papers

Other concurrency changes in the working draft

Updated implementation

Thursday, 29 October 2009

Friday, 16 October 2009

How to enter

Wednesday, 30 September 2009

Concurrency-related papers

Other papers

Friday, 11 September 2009

Friday, 21 August 2009

Acquiring locks on multiple mutexes

Avoiding deadlock with std::lock()

Other mutex types

Exception safety

No silver bullet

Next time

Try it out

Thursday, 13 August 2009

Lazy Initialization

Reinitialization

Double-checked locking returns

Memory ordering

Next time

Try it out

Avoiding deadlock with `std::lock()`