Just Software Solutions

Blog Archive for / 2009 / 08 /

Multithreading in C++0x part 7: Locking multiple mutexes without deadlock

Friday, 21 August 2009

This is the seventh in a series of blog posts introducing the new C++0x thread library. So far we've looked at the various ways of starting threads in C++0x and protecting shared data with mutexes. See the end of this article for a full set of links to the rest of the series.

After last time's detour into lazy initialization, our regular programming resumes with the promised article on std::lock().

Acquiring locks on multiple mutexes

In most cases, your code should only ever hold a lock on one mutex at a time. Occasionally, you might nest your locks, e.g. by calling into a subsystem that protects its internal data with a mutex whilst holding a lock on another mutex, but it's generally better to avoid holding locks on multiple mutexes at once if at all possible.

However, sometimes it is necessary to hold a lock on more than one mutex, because you need to perform an operation on two distinct items of data, each of which is protected by its own mutex. In order to perform the operation correctly, you need to hold locks on both the mutexes, otherwise another thread could change one of the values before your operation was complete. For example, consider a bank transfer between two accounts: we want the transfer to be a single action to the outside world, so we need to acquire the lock on the mutex protecting each account. You could naively implement this like so:

class account
{
    std::mutex m;
    currency_value balance;
public:

    friend void transfer(account& from,account& to,
                         currency_value amount)
    {
        std::lock_guard<std::mutex> lock_from(from.m);
        std::lock_guard<std::mutex> lock_to(to.m);
        from.balance -= amount;
        to.balance += amount;
    }
};

Though it looks right at first glance (both locks are acquired before data is accessed), there is a potential for deadlock in this code. Consider two threads calling transfer() at the same time on the same accounts, but one thread is transferring money from account A to account B, whereas the other is transferring money from account B to account A. If the calls to transfer() are running concurrently, then both threads will try and lock the mutex on their from account. Assuming there's nothing else happening in the system, this will succeed, as for thread 1 the from account is account A, whereas for thread 2 the from account is account B. Now each thread tries to acquire the lock on the mutex for the corresponding to. Thread 1 will try and lock the mutex for account B (thread 1 is transferring from A to B). Unfortunately, it will block because thread 2 holds that lock. Meanwhile thread B will try and lock the mutex for account A (thread 2 is transferring from B to A). Thread 1 holds this mutex, so thread 2 must also block — deadlock.

Avoiding deadlock with std::lock()

In order to avoid this problem, you need to somehow tell the system to lock the two mutexes together, so that one of the threads acquires both locks and deadlock is avoided. This is what the std::lock() function is for — you supply a number of mutexes (it's a variadic template) and the thread library ensures that when the function returns they are all locked. Thus we can avoid the race condition in transfer() by writing it as follows:

void transfer(account& from,account& to,
              currency_value amount)
{
    std::lock(from.m,to.m);
    std::lock_guard<std::mutex> lock_from(from.m,std::adopt_lock);
    std::lock_guard<std::mutex> lock_to(to.m,std::adopt_lock);
    from.balance -= amount;
    to.balance += amount;
}

Here we use std::lock() to lock both mutexes safely, then adopt the ownership into the std::lock_guard instances to ensure the locks are released safely at the end of the function.

Other mutex types

As mentioned already, std::lock() is a function template rather than a plain function. Not only does this mean it can merrily accept any number of mutex arguments, but it also means that it can accept any type of mutex arguments. The arguments don't even all have to be the same type. You can pass anything which implements lock(), try_lock() and unlock() member functions with appropriate semantics. As you may remember from part 5 of this series, std::unique_lock<> provides these member functions, so you can pass an instance of std::unique_lock<> to std::lock(). Indeed, you could also write transfer() using std::unique_lock<> like this:

void transfer(account& from,account& to,
              currency_value amount)
{
    std::unique_lock<std::mutex> lock_from(from.m,std::defer_lock);
    std::unique_lock<std::mutex> lock_to(to.m,std::defer_lock);
    std::lock(lock_from,lock_to);
    from.balance -= amount;
    to.balance += amount;
}

In this case, we construct the std::unique_lock<> instances without actually locking the mutexes, and then lock them both together with std::lock() afterwards. You still get all the benefits of the deadlock avoidance, and the same level of exception safety — which approach to use is up to you, and depends on what else is happening in the code.

Exception safety

Since std::lock() has to work with any mutex type you might throw at it, including user-defined ones, it has to be able to cope with exceptions. In particular, it has to provide sensible behaviour if a call to lock() or try_lock() on one of the supplied mutexes throws an exception. The way it does this is quite simple: if a call to lock() or try_lock() on one of the supplied mutexes throws an exception then unlock() is called on each of the mutexes for which this call to std::lock() currently owns a lock. So, if you are locking 4 mutexes and the call has successfully acquired 2 of them, but a call to try_lock() on the third throws an exception then the locks on the first two will be released by calls to unlock().

The upshot of this is that if std::lock() completes successfully then the calling thread owns the lock on all the supplied mutexes, but if the call exits with an exception then from the point of view of lock ownership it will be as-if the call was never made — any additional locks acquired have been released again.

No silver bullet

There are many ways to write code that deadlocks: std::lock() only addresses the particular case of acquiring multiple mutexes together. However, if you need to do this then std::lock() ensures that you don't need to worry about lock ordering issues.

If your code acquires locks on multiple mutexes, but std::lock() isn't applicable for your case, then you need to take care of lock ordering issues another way. One possibility is to enforce an ordering by using a hierarchical mutex. If you've got a deadlock in your code, then things like the deadlock detection mode of my just::thread library can help you pinpoint the cause.

Next time

In the next installment we'll take a look at the "futures" mechanism from C++0x. Futures are a high level mechanism for passing a value between threads, and allow a thread to wait for a result to be available without having to manage the locks directly.

Subscribe to the RSS feed RSS feed or email newsletter for this blog to be sure you don't miss the rest of the series.

Try it out

If you're using Microsoft Visual Studio 2008 or g++ 4.3 or 4.4 on Ubuntu Linux you can try out the examples from this series using our just::thread implementation of the new C++0x thread library. Get your copy today.

Multithreading in C++0x Series

Here are the posts in this series so far:

Posted by Anthony Williams
[/ threading /] permanent link
Tags: , , , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Multithreading in C++0x part 6: Lazy initialization and double-checked locking with atomics

Thursday, 13 August 2009

This is the sixth in a series of blog posts introducing the new C++0x thread library. So far we've looked at the various ways of starting threads in C++0x and protecting shared data with mutexes. See the end of this article for a full set of links to the rest of the series.

I had intended to write about the use of the new std::lock() for avoiding deadlock in this article. However, there was a post on comp.std.c++ this morning about lazy initialization, and I thought I'd write about that instead. std::lock() can wait until next time.

Lazy Initialization

The classic lazy-initialization case is where you have an object that is expensive to construct but isn't always needed. In this case you can choose to only initialize it on first use:

class lazy_init
{
    mutable std::unique_ptr<expensive_data> data;
public:
    expensive_data const& get_data() const
    {
        if(!data)
        {
            data.reset(new expensive_data);
        }
        return *data;
    }
};

However, we can't use this idiom in multi-threaded code, since there would be a data race on the accesses to data. Enter std::call_once() — by using an instance of std::once_flag to protect the initialization we can make the data race go away:

class lazy_init
{
    mutable std::once_flag flag;
    mutable std::unique_ptr<expensive_data> data;

    void do_init() const
    {
        data.reset(new expensive_data);
    }
public:
    expensive_data const& get_data() const
    {
        std::call_once(flag,&lazy_init::do_init,this);
        return *data;
    }
};

Concurrent calls to get_data() are now safe: if the data has already been initialized they can just proceed concurrently. If not, then all threads calling concurrently except one will wait until the remaining thread has completed the initialization.

Reinitialization

This is all very well if you only want to initialize the data once. However, what if you need to update the data — perhaps it's a cache of some rarely-changing data that's expensive to come by. std::call_once() doesn't support multiple calls (hence the name). You could of course protect the data with a mutex, as shown below:

class lazy_init_with_cache
{
    mutable std::mutex m;
    mutable std::shared_ptr<const expensive_data> data;

public:
    std::shared_ptr<const expensive_data> get_data() const
    {
        std::lock_guard<std::mutex> lk(m);
        if(!data)
        {
            data.reset(new expensive_data);
        }
        return data;
    }
    void invalidate_cache()
    {
        std::lock_guard<std::mutex> lk(m);
        data.reset();
    }
};

Note that in this case we return a std::shared_ptr<const expensive_data> rather than a reference to avoid a race condition on the data itself — this ensures that the copy held by the calling code will be valid (if out of date) even if another thread calls invalidate_cache() before the data can be used.

This "works" in the sense that it avoids data races, but if the updates are rare and the reads are frequent then this may cause unnecessary serialization when multiple threads call get_data() concurrently. What other options do we have?

Double-checked locking returns

Much has been written about how double-checked locking is broken when using multiple threads. However, the chief cause of the problem is that the sample code uses plain non-atomic operations to check the flag outside the mutex, so is subject to a data race. You can overcome this by careful use of the C++0x atomics, as shown in the example below:

class lazy_init_with_cache
{
    mutable std::mutex m;
    mutable std::shared_ptr<const expensive_data> data;

public:
    std::shared_ptr<const expensive_data> get_data() const
    {
        std::shared_ptr<const expensive_data> result=
            std::atomic_load_explicit(&data,std::memory_order_acquire);
        if(!result)
        {
            std::lock_guard<std::mutex> lk(m);
            result=data;
            if(!result)
            {
                result.reset(new expensive_data);
                std::atomic_store_explicit(&data,result,std::memory_order_release);
            }
        }
        return result;
    }
    void invalidate_cache()
    {
        std::lock_guard<std::mutex> lk(m);
        std::shared_ptr<const expensive_data> dummy;
        std::atomic_store_explicit(&data,dummy,std::memory_order_relaxed);
    }
};

Note that in this case, all writes to data use atomic operations, even those within the mutex lock. This is necessary in order to ensure that the atomic load operation at the start of get_data() actually has a coherent value to read — there's no point doing an atomic load if the stores are not atomic, otherwise you might atomically load some half-written data. Also, the atomic load and store operations ensure that the reference count on the std::shared_ptr object is correctly updated, so that the expensive_data object is correctly destroyed when the last std::shared_ptr object referencing it is destroyed.

If our atomic load actually returned a non-NULL value then we can use that, just as we did before. However, if it returned NULL then we need to lock the mutex and try again. This time we can use a plain read of data, since the mutex is locked. If we still get NULL then we need to do the initialization. However, we can't just call data.reset() like before, since that is not atomic. Instead we must create a local std::shared_ptr instance with the value we want, and store the value with an atomic store operation. We can use result for the local value, since we want the value in that variable anyway.

In invalidate_cache() we must also store the value using std::atomic_store_explicit(), in order to ensure that the NULL value is correctly read in get_data(). Note also that we must also lock the mutex here, in order to avoid a data race with the initialization code inside the mutex lock in get_data().

Memory ordering

By using std::atomic_load_explicit() and std::atomic_store_explicit() we can specify the memory ordering requirements of the operations. We could have just used std::atomic_load() and std::atomic_store(), but those would have implied std::memory_order_seq_cst, which is overkill in this scenario. What we need is to ensure that if a non-NULL value is read in get_data() then the actual creation of the associated object happens-before the read. The store in get_data() must therefore use std::memory_order_release, whilst the load uses std::memory_order_acquire.

On the other hand, the store in invalidate_cache() can merrily use std::memory_order_relaxed, since there is no data associated with the store: if the load in get_data() reads NULL then the mutex will be locked, which will handle any necessary synchronization.

Whenever you use atomic operations, you have to make sure that the memory ordering is correct, and that there are no races. Even in such a simple case such as this it is not trivial, and I would not recommend it unless profiling has shown that this is really a problem.

Update: As if to highlight my previous point about the trickiness of atomic operations, Dmitriy correctly points out in the comments that the use of std::shared_ptr to access the expensive_data implies a reference count, which is a real performance suck in multithreaded code. Whatever the memory ordering constraints we put on it, every thread doing the reading has to update the reference count. This is thus a source of contention, and can seriously limit scalability even if it doesn't force full serialization. The same issues apply to multiple-reader single-writer mutexes — Joe Duffy has written about them over on his blog. Time it on your platform with just a mutex (i.e. no double-checked locking), and with the atomic operations, and use whichever is faster. Alternatively, use a memory reclamation scheme specially tailored for your usage.

Next time

In the next part of this series I'll cover the use of std::lock() that I was intending to cover in this installment.

Subscribe to the RSS feed RSS feed or email newsletter for this blog to be sure you don't miss the rest of the series.

Try it out

If you're using Microsoft Visual Studio 2008 or g++ 4.3 or 4.4 on Ubuntu Linux you can try out the examples from this series using our just::thread implementation of the new C++0x thread library. Get your copy today.

Note: since std::shared_ptr is part of the library supplied with the compiler, just::thread cannot provide the atomic functions for std::shared_ptr used in this article.

Multithreading in C++0x Series

Here are the posts in this series so far:

Posted by Anthony Williams
[/ threading /] permanent link
Tags: , , , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

August 2009 C++ Standards Committee Mailing

Thursday, 06 August 2009

The August 2009 mailing for the C++ Standards Committee was published yestoday. This is the post-meeting mailing for the July committee meeting. There isn't a new working draft this time, for one very simple reason: the "Concepts" feature was voted out of C++0x, so huge sections of the draft will have to be changed. Thankfully, the concurrency-related parts had not yet had concepts applied, so will not be affected directly by this.

C++0x timetable

What does this mean for the timetable? Well, the committee currently intends to issue a new Committee Draft for voting and comments by National Bodies in March 2010. How long it takes to get from that draft to a new Standard will depend on the comments that come back, and the length of the committee's issue lists. This is not intended to be a Final Committee Draft, implying there would be at least one more round of voting and comments. As I understand it, the earliest we could get a new Standard is thus early 2011, though if that were to be the case then the details would be finalised late 2010.

Concurrency-related papers

This time, there's only one concurrency-related paper in the mailing:

N2925: More Collected Issues with Atomics

This paper gathers together the proposed wording for a number of outstanding issues with respect to the atomic data types. This tightens up the wording regards the memory ordering guarantees of fences, and adjusts the overload set for both free functions and member functions to ensure atomics behave sensibly in more circumstances. Crucially, the header for atomic operations is also changed — rather than the <cstdatomic> / <stdatomic.h> pairing from the previous draft there is a single header: <atomic>.

Future developments

Though futures were discussed, no change was made to the working paper in this regard. Detlef and I have been asked to write a further paper on the topic to incorporate the issues raised at the meeting, which will therefore be in the next committee mailing.

Likewise, the proposals for dealing with the lifetime of thread_local variables and for launching an asynchronous task with a return value are still under discussion, and further papers on those can be expected.

Your input

How do you feel about the removal of concepts? Do you think the slip in schedule will have an impact on you? Let me know in the comments.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: , , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

just::thread C++0x Thread Library Linux Port Released

Monday, 03 August 2009

I am pleased to announce that version 1.0 of just::thread, our C++0x Thread Library is now available for Linux as well as Windows.

The just::thread library is a complete implementation of the new C++0x thread library as per the current C++0x working paper. Features include:

  • std::thread for launching threads.
  • Mutexes and condition variables.
  • std::promise, std::packaged_task, std::unique_future and std::shared_future for transferring data between threads.
  • Support for the new std::chrono time interface for sleeping and timeouts on locks and waits.
  • Atomic operations with std::atomic.
  • Support for std::exception_ptr for transferring exceptions between threads.
  • Special deadlock-detection mode for tracking down the call-stack leading to deadlocks, the bane of multithreaded programming.

The linux port is available for 32-bit and 64-bit Ubuntu linux, and takes full advantage of the C++0x support available from g++ 4.3. Purchase your copy and get started NOW.

Posted by Anthony Williams
[/ news /] permanent link
Tags: , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Previous Entries Later Entries

Design and Content Copyright © 2005-2025 Just Software Solutions Ltd. All rights reserved. | Privacy Policy