Multithreading and Concurrency

My book, C++ Concurrency in Action contains a detailed description of the C++11 threading facilities, and techniques for designing concurrent code.

The just::thread implementation of the new C++11 and C++14 thread library is available for Microsoft Visual Studio 2005, 2008, 2010, 2012, 2013 and 2015, and TDM gcc 4.5.2, 4.6.1 and 4.8.1 on Windows, g++ 4.3, 4.4, 4.5, 4.6, 4.7, 4.8 and 4.9 on Linux, and MacPorts g++ 4.3, 4.4, 4.5, 4.6, 4.7 and 4.8 on MacOSX. Order your copy today.

The Future of Concurrency in C++: Slides from ACCU 2008

Monday, 07 April 2008

My presentation on The Future of Concurrency in C++ at ACCU 2008 last Thursday went off without a hitch. I was pleased to find that my talk was well attended, and the audience had lots of worthwhile questions — hopefully I answered them to everybody's satisfaction.

For those that didn't attend, or for those that did, but would like a reminder of what I said, here are the slides from my presentation.

Posted by Anthony Williams
[/ threading /] permanent link
Tags: concurrency, multithreading, C++, ACCU
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Futures and Tasks in C++0x

Thursday, 27 March 2008

I had resigned myself to Thread Pools and Futures being punted to TR2 rather than C++0x, but it seems there is potential for some movement on this issue. At the meeting of WG21 in Kona, Hawaii in October 2007 it was agreed to include asynchronous future values in C++0x, whilst excluding thread pools and task launching.

Detlef Vollman has rekindled the effort, and drafted N2561: An Asynchronous Future Value with myself and Howard Hinnant, based on a discussion including other members of the Standards Committee. This paper proposes four templates: unique_future and shared_future, which are the asynchronous values themselves, and packaged_task and promise, which provide ways of setting the asynchronous values.

Asynchronous future values

unique_future is very much like unique_ptr: it represents exclusive ownership of the value. Ownership of a (future) value can be moved between unique_future instances, but no two unique_future instances can refer to the same asynchronous value. Once the value is ready for retrieval, it is moved out of the internal storage buffer: this allows for use with move-only types such as std::ifstream.

Similarly, shared_future is very much like shared_ptr: multiple instances can refer to the same (future) value, and shared_future instances can be copied around. In order to reduce surprises with this usage (with one thread moving the value through one instance at the same time as another tries to move it through another instance), the stored value can only be accessed via const reference, so must be copied out, or accessed in place.

Storing the future values as the return value from a function

The simplest way to calculate a future value is with a packaged_task<T>. Much like std::function<T()>, this encapsulates a callable object or function, for invoking at a later time. However, whereas std::function returns the result directly to the caller, packaged_task stores the result in a future.

    extern int some_function();
    std::packaged_task<int> task(some_function);
    std::unique_future<int> result=task.get_future();

    // later on, some thread does
    task();
    // and "result" is now ready

Making a `promise` to provide a future value

The other way to store a value to be picked up with a unique_future or shared_future is to use a promise, and then explicitly set the value by calling the set_value() member function.

    std::promise<int> my_promise;
    std::unique_future<int> result=my_promise.get_future();

    // later on, some thread does
    my_promise.set_value(42);
    // and "result" is now ready.

Exceptional returns

Futures also support storing exceptions: when you try and retrieve the value, if there is a stored exception, that exception is thrown rather than the value being retrieved. With a packaged_task, an exception gets stored if the wrapped function throws an exception when it is invoked, and with a promise, you can explicitly store an exception with the set_exception() member function.

Feedback

As the paper says, this is not a finished proposal: it is a basis for further discussion. Let me know if you have any comments.

Posted by Anthony Williams
[/ threading /] permanent link
Tags: threading, futures, asynchronous values, C++0x, wg21
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Thread Interruption in the Boost Thread Library

Tuesday, 11 March 2008

One of the new features introduced in the upcoming 1.35.0 release of the boost thread library is support for interruption of a running thread. Similar to the Java and .NET interruption support, this allows for one thread to request another thread to stop at the next interruption point. This is the only way to explicitly request a thread to terminate that is directly supported by the Boost Thread library, though users can manually implement cooperative interruption if required.

Interrupting a thread in this way is much less dangerous than brute-force tactics such as TerminateThread(), as such tactics can leave broken invariants and leak resources. If a thread is killed using a brute-force method and it was holding any locks, this can also potentially lead to deadlock when another thread tries to acquire those locks at some future point. Interruption is also easier and more reliable than rolling your own cooperative termination scheme using mutexes, flags, condition variables, or some other synchronization mechanism, since it is part of the library.

Interrupting a Thread

A running thread can be interrupted by calling the interrupt() member function on the corresponding boost::thread object. If the thread doesn't have a boost::thread object (e.g the initial thread of the application), then it cannot be interrupted.

Calling interrupt() just sets a flag in the thread management structure for that thread and returns: it doesn't wait for the thread to actually be interrupted. This is important, because a thread can only be interrupted at one of the predefined interruption points, and it might be that a thread never executes an interruption point, so never sees the request. Currently, the interruption points are:

boost::thread::join()
boost::thread::timed_join()
boost::condition_variable::wait()
boost::condition_variable::timed_wait()
boost::condition_variable_any::wait()
boost::condition_variable_any::timed_wait()
boost::this_thread::sleep()
boost::this_thread::interruption_point()

When a thread reaches one of these interruption points, if interruption is enabled for that thread then it checks its interruption flag. If the flag is set, then it is cleared, and a boost::thread_interrupted exception is thrown. If the thread is already blocked on a call to one of the interruption points with interruption enabled when interrupt() is called, then the thread will wake in order to throw the boost::thread_interrupted exception.

Catching an Interruption

boost::thread_interrupted is just a normal exception, so it can be caught, just like any other exception. This is why the "interrupted" flag is cleared when the exception is thrown — if a thread catches and handles the interruption, it is perfectly acceptable to interrupt it again. This can be used, for example, when a worker thread that is processing a series of independent tasks — if the current task is interrupted, the worker can handle the interruption and discard the task, and move onto the next task, which can then in turn be interrupted. It also allows the thread to catch the exception and terminate itself by other means, such as returning error codes, or translating the exception to pass through module boundaries.

Disabling Interruptions

Sometimes it is necessary to avoid being interrupted for a particular section of code, such as in a destructor where an exception has the potential to cause immediate process termination. This is done by constructing an instance of boost::this_thread::disable_interruption. Objects of this class disable interruption for the thread that created them on construction, and restore the interruption state to whatever it was before on destruction:

    void f()
    {
        // interruption enabled here
        {
            boost::this_thread::disable_interruption di;
            // interruption disabled
            {
                boost::this_thread::disable_interruption di2;
                // interruption still disabled
            } // di2 destroyed, interruption state restored
            // interruption still disabled
        } // di destroyed, interruption state restored
        // interruption now enabled
    }

The effects of an instance of boost::this_thread::disable_interruption can be temporarily reversed by constructing an instance of boost::this_thread::restore_interruption, passing in the boost::this_thread::disable_interruption object in question. This will restore the interruption state to what it was when the boost::this_thread::disable_interruption object was constructed, and then disable interruption again when the boost::this_thread::restore_interruption object is destroyed:

    void g()
    {
        // interruption enabled here
        {
            boost::this_thread::disable_interruption di;
            // interruption disabled
            {
                boost::this_thread::restore_interruption ri(di);
                // interruption now enabled
            } // ri destroyed, interruption disabled again
            {
                boost::this_thread::disable_interruption di2;
                // interruption disabled
                {
                    boost::this_thread::restore_interruption ri2(di2);
                    // interruption still disabled
                    // as it was disabled when di2 constructed
                } // ri2 destroyed, interruption still disabled
            } //di2 destroyed, interruption still disabled
        } // di destroyed, interruption state restored
        // interruption now enabled
    }

boost::this_thread::disable_interruption and boost::this_thread::restore_interruption cannot be moved or copied, and they are the only way of enabling and disabling interruption. This ensures that the interruption state is correctly restored when the scope is exited (whether normally, or by an exception), and that you cannot enable interruptions in the middle of an interruption-disabled block unless you're in full control of the code, and have access to the boost::this_thread::disable_interruption instance.

At any point, the interruption state for the current thread can be queried by calling boost::this_thread::interruption_enabled().

Cooperative Interruption

As well as the interruption points on blocking operations such as sleep() and join(), there is one interruption point explicitly designed to allow interruption at a user-designated point in the code. boost::this_thread::interruption_point() does nothing except check for an interruption, and can therefore be used in long-running code that doesn't execute any other interruption points, in order to allow for cooperative interruption. Just like the other interruption points, interruption_point() respects the interruption enabled state, and does nothing if interruption is disabled for the current thread.

Interruption is Not Cancellation

On POSIX platforms, threads can be cancelled rather than killed, by calling pthread_cancel(). This is similar to interruption, but is a separate mechanism, with different behaviour. In particular, cancellation cannot be stopped once it is started: whereas interruption just throws an exception, once a cancellation request has been acknowledged the thread is effectively dead. pthread_cancel() does not always execute destructors either (though it does on some platforms), as it is primarily a C interface — if you want to clean up your resources when a thread is cancelled, you need to use pthread_cleanup_push() to register a cleanup handler. The advantage here is that pthread_cleanup_push() works in C stack frames, whereas exceptions don't play nicely in C: on some platforms it will crash your program for an exception to propagate into a C stack frame.

For portable code, I recommend interruption over cancellation. It's supported on all platforms that can use the Boost Thread library, and it works well with C++ code — it's just another exception, so all your destructors and catch blocks work just fine.

Posted by Anthony Williams
[/ threading /] permanent link
Tags: thread, boost, interruption, concurrency, cancellation, multi-threading
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Acquiring Multiple Locks Without Deadlock

Monday, 03 March 2008

In a software system with lots of fine-grained mutexes, it can sometimes be necessary to acquire locks on more than one mutex together in order to perform some operation. If this is not done with care, then there is the possibility of deadlock, as multiple threads may lock the same mutexes in a different order. It is for this reason that the thread library coming with C++0x will include a lock() function for locking multiple mutexes together: this article describes the implementation details behind such a function.

Choose the lock order by role

The easiest way to deal with this is to always lock the mutexes in the same order. This is especially easy if the order can be hard-coded, and some uses naturally lend themselves towards this choice. For example, if the mutexes protect objects with different roles, it is relatively easy to always lock the mutex protecting one set of data before locking the other one. In such a situation, Lock hierarchies can be used to enforce the ordering — with a lock hierarchy, a thread cannot acquire a lock on a mutex with a higher hierarchy level than any mutexes currently locked by that thread.

If it is not possible to decide a-priori which mutex to lock first, such as when the mutexes are associated with the same sort of data, then a more complicated policy must be applied.

Choose the lock order by address

The simplest technique in these cases is to always lock the mutexes in ascending order of address (examples use the types and functions from the upcoming 1.35 release of Boost), like this:

void lock(boost::mutex& m1,boost::mutex& m2)
{
    if(&m1<&m2)
    {
        m1.lock();
        m2.lock();
    }
    else
    {
        m2.lock();
        m1.lock();
    }
}

This works for small numbers of mutexes, provided this policy is maintained throughout the application, but if several mutexes must be locked together, then calculating the ordering can get complicated, and potentially inefficient. It also requires that the mutexes are all of the same type. Since there are many possible mutex and lock types that an application might choose to use, this is a notable disadvantage, as the function must be written afresh for each possible combination.

Order mutexes "naturally", with try-and-back-off

If the mutexes cannot be ordered by address (for whatever reason), then an alternative scheme must be found. One such scheme is to use a try-and-back-off algorithm: try and lock each mutex in turn; if any cannot be locked, unlock the others and start again. The simplest implementation for 3 mutexes looks like this:

void lock(boost::mutex& m1,boost::mutex& m2,boost::mutex& m3)
{
    do
    {
        m1.lock();
        if(m2.try_lock())
        {
            if(m3.try_lock())
            {
                return;
            }
            m2.unlock();
        }
        m1.unlock();
    }
    while(true);
}

Wait for the failed mutex

The big problem with this scheme is that it always locks the mutexes in the same order. If m1 and m2 are currently free, but m3 is locked by another thread, then this thread will repeatedly lock m1 and m2, fail to lock m3 and unlock m1 and m2. This just wastes CPU cycles for no gain. Instead, what we want to do is block waiting for m3, and try to acquire the others only when m3 has been successfully locked by this thread. For three mutexes, a first attempt looks like this:

void lock(boost::mutex& m1,boost::mutex& m2,boost::mutex& m3)
{
    unsigned lock_first=0;
    while(true)
    {
        switch(lock_first)
        {
        case 0:
            m1.lock();
            if(m2.try_lock())
            {
                if(m3.try_lock())
                    return;
                lock_first=2;
                m2.unlock();
            }
            else
            {
                lock_first=1;
            }
            m1.unlock();
            break;
        case 1:
            m2.lock();
            if(m3.try_lock())
            {
                if(m1.try_lock())
                    return;
                lock_first=0;
                m3.unlock();
            }
            else
            {
                lock_first=2;
            }
            m2.unlock();
            break;
        case 2:
            m3.lock();
            if(m1.try_lock())
            {
                if(m2.try_lock())
                    return;
                lock_first=1;
                m1.unlock();
            }
            else
            {
                lock_first=0;
            }
            m3.unlock();
            break;
        }
    }
}

Simplicity and Robustness

This code is very long-winded, with all the duplication between the case blocks. Also, it assumes that the mutexes are all boost::mutex, which is overly restrictive. Finally, it assumes that the try_lock calls don't throw exceptions. Whilst this is true for the Boost mutexes, it is not required to be true in general, so a more robust implementation that allows the mutex type to be supplied as a template parameter will ensure that any exceptions thrown will leave all the mutexes unlocked: the unique_lock template will help with that by providing RAII locking. Taking all this into account leaves us with the following:

template<typename MutexType1,typename MutexType2,typename MutexType3>
unsigned lock_helper(MutexType1& m1,MutexType2& m2,MutexType3& m3)
{
    boost::unique_lock<MutexType1> l1(m1);
    boost::unique_lock<MutexType2> l2(m2,boost::try_to_lock);
    if(!l2)
    {
        return 1;
    }
    if(!m3.try_lock())
    {
        return 2;
    }
    l2.release();
    l1.release();
    return 0;
}

template<typename MutexType1,typename MutexType2,typename MutexType3>
void lock(MutexType1& m1,MutexType2& m2,MutexType3& m3)
{
    unsigned lock_first=0;
    while(true)
    {
        switch(lock_first)
        {
        case 0:
            lock_first=lock_helper(m1,m2,m3);
            if(!lock_first)
                return;
            break;
        case 1:
            lock_first=lock_helper(m2,m3,m1);
            if(!lock_first)
                return;
            lock_first=(lock_first+1)%3;
            break;
        case 2:
            lock_first=lock_helper(m3,m1,m2);
            if(!lock_first)
                return;
            lock_first=(lock_first+2)%3;
            break;
        }
    }
}

This code is simultaneously shorter, simpler and more general than the previous implementation, and is robust in the face of exceptions. The lock_helper function locks the first mutex, and then tries to lock the other two in turn. If either of the try_locks fail, then all currently-locked mutexes are unlocked, and it returns the index of the mutex than couldn't be locked. On success, the release members of the unique_lock instances are called to release ownership of the locks, and thus stop them automatically unlocking the mutexes during destruction, and 0 is returned. The outer lock function is just a simple wrapper around lock_helper that chooses the order of the mutexes so that the one that failed to lock last time is tried first.

Extending to more mutexes

This scheme can also be easily extended to handle more mutexes, though the code gets unavoidably longer, since there are more cases to handle — this is where the C++0x variadic templates will really come into their own. Here's the code for locking 5 mutexes together:

template<typename MutexType1,typename MutexType2,typename MutexType3,
         typename MutexType4,typename MutexType5>
unsigned lock_helper(MutexType1& m1,MutexType2& m2,MutexType3& m3,
                     MutexType4& m4,MutexType5& m5)
{
    boost::unique_lock<MutexType1> l1(m1);
    boost::unique_lock<MutexType2> l2(m2,boost::try_to_lock);
    if(!l2)
    {
        return 1;
    }
    boost::unique_lock<MutexType3> l3(m3,boost::try_to_lock);
    if(!l3)
    {
        return 2;
    }
    boost::unique_lock<MutexType4> l2(m4,boost::try_to_lock);
    if(!l4)
    {
        return 3;
    }
    if(!m5.try_lock())
    {
        return 4;
    }
    l4.release();
    l3.release();
    l2.release();
    l1.release();
    return 0;
}

template<typename MutexType1,typename MutexType2,typename MutexType3,
         typename MutexType4,typename MutexType5>
void lock(MutexType1& m1,MutexType2& m2,MutexType3& m3,
          MutexType4& m4,MutexType5& m5)
{
    unsigned const lock_count=5;
    unsigned lock_first=0;
    while(true)
    {
        switch(lock_first)
        {
        case 0:
            lock_first=lock_helper(m1,m2,m3,m4,m5);
            if(!lock_first)
                return;
            break;
        case 1:
            lock_first=lock_helper(m2,m3,m4,m5,m1);
            if(!lock_first)
                return;
            lock_first=(lock_first+1)%lock_count;
            break;
        case 2:
            lock_first=lock_helper(m3,m4,m5,m1,m2);
            if(!lock_first)
                return;
            lock_first=(lock_first+2)%lock_count;
            break;
        case 3:
            lock_first=lock_helper(m4,m5,m1,m2,m3);
            if(!lock_first)
                return;
            lock_first=(lock_first+3)%lock_count;
            break;
        case 4:
            lock_first=lock_helper(m5,m1,m2,m3,m4);
            if(!lock_first)
                return;
            lock_first=(lock_first+4)%lock_count;
            break;
        }
    }
}

Final Code

The final code for acquiring multiple locks provides try_lock and lock functions for 2 to 5 mutexes. Though the try_lock functions are relatively straight-forward, their existence makes the lock_helper functions slightly simpler, as they can just defer to the appropriate overload of try_lock to cover all the mutexes beyond the first one.

Posted by Anthony Williams
[/ threading /] permanent link
Tags: threading, concurrency, mutexes, locks
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Thread Library Now in C++0x Working Draft

Monday, 11 February 2008

The latest proposal for the C++ standard thread library has finally made it into the C++0x working draft.

Woohoo!

There will undoubtedly be minor changes as feedback comes in to the committee, but this is the first real look at what C++0x thread support will entail, as approved by the whole committee. The working draft also includes the new C++0x memory model, and atomic types and operations. This means that for the first time, C++ programs will legitimately be able to spawn threads without immediately straying into undefined behaviour. Not only that, but the memory model has been very carefully thought out, so it should be possible to write even low-level stuff such as lock-free containers in Standard C++.

Posted by Anthony Williams
[/ threading /] permanent link
Tags: threading, C++, C++0x, news
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Intel and AMD Define Memory Ordering

Monday, 17 September 2007

For a long time, the ordering of memory accesses between processors in a multi-core or multi-processor system based on the Intel x86 architecture has been under specified. Many newsgroup posts have discussed the interpretation of the Intel and AMD software developer manuals, and how that translates to actual guarantees, but there has been nothing authoritative, despite comments from Intel engineers. This has now changed! Both Intel and AMD have now released documentation of their memory ordering guarantees — Intel has published a new white paper (Intel 64 Architecture Memory Ordering White Paper) devoted to the issue, whereas AMD have updated their programmer's manual (Section 7.2 of AMD64 Architecture Programmer's Manual Volume 2: System Programming Rev 3.13).

In particular, there are a couple of things that a now made explicitly clear by this documentation:

Stores from a single processor cannot be reordered, and
Memory accesses obey causal consistency, so
An aligned load is an acquire operation, and
An aligned store is a release operation, and
A locked instruction (such as xchg or lock cmpxchg) is both an acquire and a release operation.

This has implications for the implementation of threading primitives such as mutexes for IA-32 and Intel 64 architectures — in some cases the code can be simplified, where it has been written to take a pessimistic interpretation of the specifications.

Posted by Anthony Williams
[/ threading /] permanent link
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

More recent entries

Just Software Solutions

About Us

Technical Writings

Subscribe to Blog

Blog Archives

Multithreading and Concurrency

The Future of Concurrency in C++: Slides from ACCU 2008

Monday, 07 April 2008

Futures and Tasks in C++0x

Thursday, 27 March 2008

Asynchronous future values

Storing the future values as the return value from a function

Making a `promise` to provide a future value

Exceptional returns

Feedback

Thread Interruption in the Boost Thread Library

Tuesday, 11 March 2008

Interrupting a Thread

Catching an Interruption

Disabling Interruptions

Cooperative Interruption

Interruption is Not Cancellation

Acquiring Multiple Locks Without Deadlock

Monday, 03 March 2008

Choose the lock order by role

Choose the lock order by address

Order mutexes "naturally", with try-and-back-off

Wait for the failed mutex

Simplicity and Robustness

Extending to more mutexes

Final Code

Thread Library Now in C++0x Working Draft

Monday, 11 February 2008

Intel and AMD Define Memory Ordering

Monday, 17 September 2007

Just Software Solutions

About Us

Technical Writings

Subscribe to Blog

Blog Archives

Multithreading and Concurrency

The Future of Concurrency in C++: Slides from ACCU 2008

Monday, 07 April 2008

Futures and Tasks in C++0x

Thursday, 27 March 2008

Asynchronous future values

Storing the future values as the return value from a function

Making a promise to provide a future value

Exceptional returns

Feedback

Thread Interruption in the Boost Thread Library

Tuesday, 11 March 2008

Interrupting a Thread

Catching an Interruption

Disabling Interruptions

Cooperative Interruption

Interruption is Not Cancellation

Acquiring Multiple Locks Without Deadlock

Monday, 03 March 2008

Choose the lock order by role

Choose the lock order by address

Order mutexes "naturally", with try-and-back-off

Wait for the failed mutex

Simplicity and Robustness

Extending to more mutexes

Final Code

Thread Library Now in C++0x Working Draft

Monday, 11 February 2008

Intel and AMD Define Memory Ordering

Monday, 17 September 2007

Making a `promise` to provide a future value