Blog Archive

Coming Soon: Just::Thread Pro

Friday, 29 October 2010

Multithreaded code doesn't have to be complicated.

That's the idea behind the Just::Thread Pro library. By providing a set of high level facilities in the library, your application code can be simplified — rather than spending your time on the complexities of multithreading and concurrency you can instead focus on what it is your application is trying to achieve.

Building on the Just::Thread C++0x thread library, Just::Thread Pro will provide facilities to:

Encapsulate communication between threads to avoid deadlocks and race conditions
Easily scale your application to make use of multi-core processors
Parallelize existing single-threaded code without a major rewrite

Just::Thread Pro will be available for all platforms supported by Just::Thread.

Head over to the Just::Thread Pro website and sign up to receive further news about the library and notification when it is released.

Posted by Anthony Williams
[/ news /] permanent link
Tags: concurrency, cplusplus, multithreading
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

October 2010 C++ Standards Committee Mailing

Thursday, 21 October 2010

The October 2010 mailing for the C++ Standards Committee was published earlier this week. This is the pre-meeting mailing for the November 2010 committee meeting.

As well as the usual core and library issues lists, this mailing also includes a Summary of the status of the FCD comments, along with a whole host of papers attempting to address some the remaining FCD comments.

To move or not to move

The big issue of the upcoming meeting is looking to be whether or not the compiler should implicitly generate move constructors and move assignment operators akin to the copy constructors and copy assignment operators that are currently auto generated. The wording in the FCD requires this, but people are concerned that this will break existing code when people start using their code with a C++0x compiler and library. There are two papers on the subject in the mailing: N3153: Implicit Move Must Go by Dave Abrahams, and N3174: To move or not to move by Bjarne Stroustrup.

There seems to be consensus among committee members that the FCD requires compilers to generate the move constructor and move assignment operator in cases that will break existing code. The key question is whether the breakage can be limited by restricting the cases in which the move members are implicitly generated, or whether implicit generation should be abandoned altogether. The various options are explained very clearly in the papers.

Exceptions and Destructors

N3166: Destructors default to noexcept is another potentially controversial issue. It is generally acknowledged that throwing exceptions from destructors is a bad idea, not least because this leads to termination if the destructor is invoked whilst the stack is being unwound due to another exception. Herb Sutter wrote about this way back in 1998 when the original C++ standard was hot off the presses, in GotW #47: Uncaught Exceptions.

The proposal in the paper comes from a Finnish comment on the FCD, and is quite simple: by default all destructors are assumed to be marked noexcept(true) (which is the new way of saying they cannot throw an exception, similar to an exception specification of throw()), unless they explicitly have a non-empty exception specification or are marked noexcept(false).

Since it is generally good practice not to throw from a destructor, you'd think this would be uncontroversial. Unfortunately it is not the case — there are currently situations where throwing from a destructor has defined behaviour, and even does exactly what people want. The example most frequently cited is the SOCI project for accessing databases from C++. This library provides an easy syntax for constructing SQL queries using the << operator. The operator builds a temporary object which executes the SQL in the destructor. If the SQL is invalid, or executing it causes an exception for any other reason then the destructor throws. Changing destructors to be noexcept(true) by default will make such code terminate on a database error unless the destructor is updated to declare that it can throw exceptions. Working code with defined behaviour is thus broken when recompiled with a C++0x compiler.

Concurrency-related papers

There are 3 concurrency-related papers in this mailing, which I've summarised below.

N3152: Progress guarantees for C++0x (US 3 and US 186)

The FCD does not make any progress guarantees when multiple threads are used. In particular, writes made by one thread do not ever have to become visible to other threads, and threads aren't guaranteed ever to actually run at all. This paper looks at the issues and provides wording for minimal guarantees.

N3164: Adjusting C++ Atomics for C Compatibility

This is an update to N3137 from the last mailing, which provides detailed wording updates for the required changes to regain compatibility with C1X atomics.

N3170: Clarifying C++ Futures

There were a few FCD comments from the US about the use of futures; this paper outlines all the issues and potential solutions. The proposed changes are actually fairly minor though:

future gains a share() member function for easy conversion to the corresponding shared_future type;
Accessing a shared_future for which valid() is false is now required to throw an exception rather than be undefined behaviour;
atomic_future is to be removed;

A few minor changes have also been made to the wording to make things clearer.

If you have any opinions on any of the papers listed here, or the resolution of any NB comments, please add them to the comments for this post.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: C++0x, C++, standards, concurrency
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

just::thread C++0x Thread Library V1.4.2 Released

Friday, 15 October 2010

I am pleased to announce that version 1.4.2 of just::thread, our C++0x Thread Library has just been released.

The big change with this release is the new support for gcc 4.5 on Ubuntu Linux. If you're running Ubuntu Lucid then you can get the .DEB files for gcc 4.5 from yesterday's blog post. For Ubuntu Maverick, gcc 4.5 is in the repositories.

Other changes:

Overflow in ratio arithmetic will now cause a compilation failure
Ratio arithmetic operations derive from the resulting std::ratio instantiation as well as providing the ::type member to better emulate the C++0x working draft
On Windows, just::thread can now be used in MFC DLLs

Purchase your copy and get started with the C++0x thread library now.

As usual, existing customers are entitled to a free upgrade to V1.4.2 from all earlier versions.

Posted by Anthony Williams
[/ news /] permanent link
Tags: multithreading, concurrency, C++0x
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

gcc 4.5 Packages for Ubuntu Lucid

Thursday, 14 October 2010

Ubuntu Maverick was released earlier this week. Amongst other things, gcc 4.5 is available in the repositories, whereas for previous versions you had to build it yourself from source.

In order to save you the pain of compiling gcc 4.5 for yourself (which can take a while, and overheated my laptop when I tried), I've built it for Ubuntu Lucid, and uploaded the .deb files to my website. The .debs are built from the Maverick source packages for gcc 4.5.1, binutils 2.20.51, cloog-ppl and mpclib, and I've built them for both i386 and amd64 architectures.

Enjoy!

Posted by Anthony Williams
[/ news /] permanent link
Tags: gcc, lucid, ubuntu
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Concept Checking Without C++0x Concepts

Wednesday, 06 October 2010

My latest article, Concept Checking Without Concepts in C++ was published on the Dr Dobb's website a couple of weeks ago.

One of the important features of the now-defunct C++0x Concepts proposal was the ability to overload functions based on whether or not their arguments met certain concepts. This article describes a way to allow that for concepts based on the presence of particular member functions.

The basic idea is that you can write traits classes that detect particular sets of member functions. Function overloads that require these concepts can then be enabled or disabled by using std::enable_if with these traits.

The example I use is checking for a Lockable type which has lock(), unlock() and try_lock() member functions, but the same technique could easily be used for other concepts that required other member functions.

Read the article for the full details.

Posted by Anthony Williams
[/ news /] permanent link
Tags: concepts, cplusplus
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

August 2010 C++ Standards Committee Mailing

Wednesday, 08 September 2010

The August 2010 mailing for the C++ Standards Committee was published recently. This is the post-meeting mailing for the August 2010 committee meeting, and contains a new C++0x Working Draft. At the meeting in August, the committee discussed many of the National Body comments on the FCD, and this draft incorporates those changes that the committee approved of. As you can see from the FCD Comment Status document in this mailing, there were 301 technical comments and a further 215 editorial comments. Of these, 98 technical comments have been accepted as-is, 8 have been accepted with changes, and 63 have been rejected, leaving 132 technical comments that have still not been addressed one way or the other.

No significant changes have been accepted to the concurrency-related parts of the working draft, though there are quite a few editorial comments. However, there are several papers in this mailing that address the National Body comments in this area. These papers have by and large been drafted to represent the consensus of those members of the concurreny group in the LWG who were present at the meeting. I have summarised these papers below.

Concurrency-related papers

N3113: Async Launch Policies (CH 36)

This paper provides a clearer basis for implementors to supply additional launch policies for std::async, or for the committee to do so in a later revision of the C++ standard, by making the std::launch enum a bitmask type. It also drops the std::launch::any enumeration value, and renames std::launch::sync to std::launch::deferred, as this better describes what it means.

The use of a bitmask allows new values to be added which are either distinct values, or combinations of the others. The default policy for std::async is thus std::launch::async|std::launch::deferred.

N3125: Omnibus Memory Model and Atomics Paper

This paper addresses several National Body comments by updating the wording in the draft standard to better reflect the intent of the committee.

N3128: C++ Timeout Specification

There are several functions in the threading portion of the library that allow timeouts, such as the try_lock_for and try_lock_until member functions of the timed mutex types, and the wait_for and wait_until member functions of the future types. This paper clarifies what it means to wait for a specified duration (with the xxx_for functions), and what it means to wait until a specified time point (with the xxx_until functions). In particular, it clarifies what can be expected of the implementation if the clock is changed during a wait.

This paper also proposes replacing the old std::chrono::monotonic_clock with a new std::chrono::steady_clock. Whereas the only constraint on the monotonic clock was that it never went backwards, the steady clock cannot be adjusted, and always ticks at a uniform rate. This fulfils the original intent of the monotonic clock, but provides a clearer specification and name. It is also tied into the new wait specifications, since waiting for a duration requires a steady clock for use as a basis.

N3129: Managing C++ Associated Asynchronous State

This paper tidies up the wording of the functions and classes related to the future types, and clarifies the management of the associated asynchronous state which is used to communicate e.g. between a std::promise and a std::future that will receive the result.

N3130: Lockable requirements for C++0x

This paper splits out the requirements for general lockable types away from the specific requirements on the standard mutex types. This allows the lockable concepts to be used to specify the requirements on a type to be used the the std::lock_guard and std::unique_lock class templates, as well as for the various overloads of the wait functions on std::condition_variable_any, without imposing the precise behaviour of std::mutex on user-defined mutex types.

N3132: Mathematizing C++ Concurrency: The Post-Rapperswil Model

This paper provides a mathematical description for the C++0x memory model. A similar description was used to highlight some of the areas that are clarified by the omnibus memory model paper (N3125) described above.

N3136: Coherence Requirements Detailed

This paper introduces some simple coherence requirements to the memory model wording to make it clear that the sequence of values read for a given variable must be consistent across threads. The existence of a single modification order for each variable is a key component of the memory model, and the wording introduced in this paper makes it clear that this is a core requirement.

N3137: C and C++ Liaison: Compatibility for Atomics

The structure of the atomic types and operations in the FCD was carefully worked out in conjunction with the C standards committee to ensure that the C++0x atomic types were compatible with those being introduced in the upcoming C1x standard. Unfortunately, the C committee introduced a new incompatible syntax for atomic types into the C1x draft earlier this year because they believed it was a better match for the C language.

This paper attempts to address this new incompatibility by removing the atomic_xxx types that were originally added for C compatibility, leaving just the std::atomic<T> class template. Also, a new _Atomic(T) macro is introduced for compatibility with the new C1x _Atomic keyword.

Other papers

As already mentioned, this mailing contains a new C++0x Working Draft, along with the usual post-meeting stuff — editors notes for the changes in the new draft, new issues lists, minutes of the meeting, etc. It also contains a complete list of the National Body Comments on the FCD, and a few other papers addressing National Body comments.

If you have any opinions on the resolution of any NB comments not yet formally accepted or rejected, please add them to the comments for this post.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: C++0x, C++, standards, concurrency
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Definitions of Non-blocking, Lock-free and Wait-free

Tuesday, 07 September 2010

There have repeatedly been posts on comp.programming.threads asking for a definition of these terms. To write good multithreaded code you really need to understand what these mean, and how they affect the behaviour and performance of algorithms with these properties. I thought it would be therefore be useful to provide some definitions.

Definition of Blocking

A function is said to be blocking if it calls an operating system function that waits for an event to occur or a time period to elapse. Whilst a blocking call is waiting the operating system can often remove that thread from the scheduler, so it takes no CPU time until the event has occurred or the time has elapsed. Once the event has occurred then the thread is placed back in the scheduler and can run when allocated a time slice. A thread that is running a blocking call is said to be blocked.

Mutex lock functions such as std::mutex::lock(), and EnterCriticalSection() are blocking, as are wait functions such as std::future::wait() and std::condition_variable::wait(). However, blocking functions are not limited to synchronization facilities: the most common blocking functions are I/O facilities such as fread() or WriteFile(). Timing facilities such as Sleep(), or std::this_thread::sleep_until() are also often blocking if the delay period is long enough.

Definition of Non-blocking

Non-blocking functions are just those that aren't blocking. Non-blocking data structures are those on which all operations are non-blocking. All lock-free data structures are inherently non-blocking.

Spin-locks are an example of non-blocking synchronization: if one thread has a lock then waiting threads are not suspended, but must instead loop until the thread that holds the lock has released it. Spin locks and other algorithms with busy-wait loops are not lock-free, because if one thread (the one holding the lock) is suspended then no thread can make progress.

Defintion of lock-free

A lock-free data structure is one that doesn't use any mutex locks. The implication is that multiple threads can access the data structure concurrently without race conditions or data corruption, even though there are no locks — people would give you funny looks if you suggested that std::list was a lock-free data structure, even though it is unlikely that there are any locks used in the implementation.

Just because more than one thread can safely access a lock-free data structure concurrently doesn't mean that there are no restrictions on such accesses. For example, a lock-free queue might allow one thread to add values to the back whilst another removes them from the front, whereas multiple threads adding new values concurrently would potentially corrupt the data structure. The data structure description will identify which combinations of operations can safely be called concurrently.

For a data structure to qualify as lock-free, if any thread performing an operation on the data structure is suspended at any point during that operation then the other threads accessing the data structure must still be able to complete their tasks. This is the fundamental restriction which distinguishes it from non-blocking data structures that use spin-locks or other busy-wait mechanisms.

Just because a data structure is lock-free it doesn't mean that threads don't have to wait for each other. If an operation takes more than one step then a thread may be pre-empted by the OS part-way through an operation. When it resumes the state may have changed, and the thread may have to restart the operation.

In some cases, a the partially-completed operation would prevent other threads performing their desired operations on the data structure until the operation is complete. In order for the algorithm to be lock-free, these threads must then either abort or complete the partially-completed operation of the suspended thread. When the suspended thread is woken by the scheduler it can then either retry or accept the completion of its operation as appropriate. In lock-free algorithms, a thread may find that it has to retry its operation an unbounded number of times when there is high contention.

If you use a lock-free data structure where multiple threads modify the same pieces of data and thus cause each other to retry then high rates of access from multiple threads can seriously cripple the performance, as the threads hinder each other's progress. This is why wait-free data structures are so important: they don't suffer from the same set-backs.

Definition of wait-free

A wait-free data structure is a lock-free data structure with the additional property that every thread accessing the data structure can make complete its operation within a bounded number of steps, regardless of the behaviour of other threads. Algorithms that can involve an unbounded number of retries due to clashes with other threads are thus not wait-free.

This property means that high-priority threads accessing the data structure never have to wait for low-priority threads to complete their operations on the data structure, and every thread will always be able to make progress when it is scheduled to run by the OS. For real-time or semi-real-time systems this can be an essential property, as the indefinite wait-periods of blocking or non-wait-free lock-free data structures do not allow their use within time-limited operations.

The downside of wait-free data structures is that they are more complex than their non-wait-free counterparts. This imposes an overhead on each operation, potentially making the average time taken to perform an operation considerably longer than the same operation on an equivalent non-wait-free data structure.

Choices

When choosing a data structure for a given task you need to think about the costs and benefits of each of the options.

A lock-based data structure is probably the easiest to use, reason about and write, but has the potential for limited concurrency. They may also be the fastest in low-load scenarios.

A lock-free (but not wait-free) data structure has the potential to allow more concurrent accesses, but with the possibility of busy-waits under high loads. Lock-free data structures are considerably harder to write, and the additional concurrency can make reasoning about the program behaviour harder. They may be faster than lock-based data structures, but not necessarily.

Finally, a wait-free data structure has the maximum potential for true concurrent access, without the possibility of busy waits. However, these are very much harder to write than other lock-free data structures, and typically impose an additional performance cost on every access.

Posted by Anthony Williams
[/ threading /] permanent link
Tags: concurrency, threading, multithreading, lock-free, wait-free
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

just::thread C++0x Thread Library V1.4.1 Released

Monday, 09 August 2010

I am pleased to announce that version 1.4.1 of just::thread, our C++0x Thread Library has just been released.

Thisis an improvement over V1.4.0 in a number of areas:

Both /Zc:wchar_t and /Zc:wchar_t- are supported with MSVC
std::chrono::high_resolution_clock typedef added
Added support for shared libraries on Linux
Faster mutex locking and unlocking on contended mutexes on Linux
Faster blocking/unblocking for condition variables on Linux
Support for tracking clock changes when waiting on a std::chrono::system_clock time with std::condition_variable on Linux with kernels >= 2.6.31
Support for floating-point durations
Faster time retrieval with std::chrono::monotonic_clock::now() on Windows
Added support for Microsoft Visual Studio 2005

Purchase your copy and get started with the C++0x thread library now.

As usual, existing customers are entitled to a free upgrade to V1.4.1 from all earlier versions.

Posted by Anthony Williams
[/ news /] permanent link
Tags: multithreading, concurrency, C++0x
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Implementing Dekker's algorithm with Fences

Tuesday, 27 July 2010

Dekker's algorithm is one of the most basic algorithms for mutual exclusion, alongside Peterson's algorithm and Lamport's bakery algorithm. It has the nice property that it only requires load and store operations rather than exchange or test-and-set, but it still requires some level of ordering between the operations. On a weakly-ordered architecture such as PowerPC or SPARC, a correct implementation of Dekker's algorithm thus requires the use of fences or memory barriers in order to ensure correct operation.

The code

For those of you who just want the code: here it is — Dekker's algorithm in C++, with explicit fences.

std::atomic<bool> flag0(false),flag1(false);
std::atomic<int> turn(0);

void p0()
{
    flag0.store(true,std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_seq_cst);

    while (flag1.load(std::memory_order_relaxed))
    {
        if (turn.load(std::memory_order_relaxed) != 0)
        {
            flag0.store(false,std::memory_order_relaxed);
            while (turn.load(std::memory_order_relaxed) != 0)
            {
            }
            flag0.store(true,std::memory_order_relaxed);
            std::atomic_thread_fence(std::memory_order_seq_cst);
        }
    }
    std::atomic_thread_fence(std::memory_order_acquire);
 
    // critical section


    turn.store(1,std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_release);
    flag0.store(false,std::memory_order_relaxed);
}

void p1()
{
    flag1.store(true,std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_seq_cst);

    while (flag0.load(std::memory_order_relaxed))
    {
        if (turn.load(std::memory_order_relaxed) != 1)
        {
            flag1.store(false,std::memory_order_relaxed);
            while (turn.load(std::memory_order_relaxed) != 1)
            {
            }
            flag1.store(true,std::memory_order_relaxed);
            std::atomic_thread_fence(std::memory_order_seq_cst);
        }
    }
    std::atomic_thread_fence(std::memory_order_acquire);
 
    // critical section


    turn.store(0,std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_release);
    flag1.store(false,std::memory_order_relaxed);
}

The analysis

If you're like me then you'll be interested in why stuff works, rather than just taking the code. Here is my analysis of the required orderings, and how the fences guarantee those orderings.

Suppose thread 0 and thread 1 enter p0 and p1 respectively at the same time. They both set their respective flags to true, execute the fence and then read the other flag at the start of the while loop.

If both threads read false then both will enter the critical section, and the algorithm doesn't work. It is the job of the fences to ensure that this doesn't happen.

The fences are marked with memory_order_seq_cst, so either the fence in p0 is before the fence in p1 in the global ordering of memory_order_seq_cst operations, or vice-versa. Without loss of generality, we can assume that the fence in p0 comes before the fence in p1, since the code is symmetric. The store to flag0 is sequenced before the fence in p0, and the fence in p1 is sequenced before the read from flag0. Therefore the read from flag0 must see the value stored (true), so p1 will enter the while loop.

On the other side, there is no such guarantee for the read from flag1 in p0, so p0 may or may not enter the while loop. If p0 reads the value of false for flag1, it will not enter the while loop, and will instead enter the critical section, but that is OK since p1 has entered the while loop.

Though flag0 is not set to false until p0 has finished the critical section, we need to ensure that p1 does not see this until the values modified in the critical section are also visible to p1, and this is the purpose of the release fence prior to the store to flag0 and the acquire fence after the while loop. When p1 reads the value false from flag0 in order to exit the while loop, it must be reading the value store by p0 after the release fence at the end of the critical section. The acquire fence after the load guarantees that all the values written before the release fence prior to the store are visible, which is exactly what we need here.

If p0 reads true for flag1, it will enter the while loop rather than the critical section. Both threads are now looping, so we need a way to ensure that exactly one of them breaks out. This is the purpose of the turn variable. Initially, turn is 0, so p1 will enter the if and set flag1 to false, whilst p1 will not enter the if. Because p1 set flag1 to false, eventually p0 will read flag1 as false and exit the outer while loop to enter the critical section. On the other hand, p1 is now stuck in the inner while loop because turn is 0. When p0 exits the critical section it sets turn to 1. This will eventually be seen by p1, allowing it to exit the inner while loop. When the store to flag0 becomes visible p1 can then exit the outer while loop too.

If turn had been 1 initially (because p0 was the last thread to enter the critical section) then the inverse logic would apply, and p0 would enter the inner loop, allowing p1 to enter the critical section first.

Second time around

If p0 is called a second time whilst p1 is still in the inner loop then we have a similar situation to the start of the function — p1 may exit the inner loop and store true in flag1 whilst p0 stores true in flag0. We therefore need the second memory_order_seq_cst fence after the store to the flag in the inner loop. This guarantees that at least one of the threads will see the flag from the other thread set when it executes the check in the outer loop. Without this fence then both threads can read false, and both can enter the critical section.

Alternatives

You could put the ordering constraints on the loads and stores themselves rather than using fences. Indeed, the default memory ordering for atomic operations in C++ is memory_order_seq_cst, so the algorithm would "just work" with plain loads and stores to atomic variables. However, by using memory_order_relaxed on the loads and stores we can add fences to ensure we have exactly the ordering constraints required.

Posted by Anthony Williams
[/ threading /] permanent link
Tags: concurrency, synchronization, Dekker, fences, cplusplus
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Reference Wrappers Explained

Wednesday, 14 July 2010

The upcoming C++0x standard includes reference wrappers in the form of the std::reference_wrapper<T> class template, and the helper function templates std::ref() and std::cref(). As I mentioned in my blog post on Starting Threads with Member Functions and Reference Arguments, these wrappers can be used to pass references to objects across interfaces that normally require copyable (or at least movable) objects — in that blog post, std::ref was used for passing references to objects over to the new thread, rather than copying the objects. I was recently asked what the difference was between std::ref and std::cref, and how they worked, so I thought I'd elaborate.

Deducing the Referenced Type

std::ref is a function template, so automatically deduces the type of the wrapped reference from the type of the supplied argument. This type deduction includes the const-ness of the supplied object:

int x=3;
const int y=4;
std::reference_wrapper<int> rx=std::ref(x);
// std::reference_wrapper<int> ry=std::ref(y); // error
std::reference_wrapper<const int> rcy=std::ref(y);

On the other hand, though std::cref also deduces the type of the wrapped reference from the supplied argument, it always wraps a const reference:

int x=3;
const int y=4;
// std::reference_wrapper<int> rx=std::cref(x); // error
std::reference_wrapper<const int> rcx=std::cref(x);
// std::reference_wrapper<int> ry=std::cref(y); // error
std::reference_wrapper<const int> rcy=std::cref(y);

Since a no-const-reference can always be bound to a const reference, you can thus use std::ref in pretty much every case where you would use std::cref, and your code would work the same. Which begs the question: why would you ever choose to use std::cref?

Using `std::cref` to prevent modification

The primary reason for choosing std::cref is because you want to guarantee that the source object is not modified through that reference. This can be important when writing multithreaded code — if a thread should not be modifying some data then it can be worth enforcing this by passing a const reference rather than a mutable reference.

void foo(int&); // mutable reference

int x=42; // Should not be modified by thread
std::thread t(foo,std::cref(x)); // will fail to compile

This can be important where there are overloads of a function such that one takes a const reference, and the other a non-const reference: if we don't want the object modified then it is important that the overload taking a const reference is chosen.

struct Foo
{
    void operator()(int&) const;
    void operator()(int const&) const;
};

int x=42;
std::thread(Foo(),std::cref(x)); // force const int& overload

References to temporaries

std::cref has another property missing from std::ref — it can bind to temporaries, since temporary objects can bind to const references. I'm not sure this is a good thing, especially when dealing with multiple threads, as the referenced temporary is likely to have been destroyed before the thread has even started. This is therefore something to watch out for:

void bar(int const&);

std::thread t(bar,std::cref(42)); // oops, ref to temporary

Documentation

Finally, std::cref serves a documentation purpose, even where std::ref would suffice — it declares in big bold letters that this reference cannot be used to modify the referenced object, which thus makes it easier to reason about the code.

Recommendation

I would recommend that you use std::cref in preference to std::ref whenever you can — the benefits as documentation of intent, and avoiding accidental modification through the reference make it a clear winner in my opinion. Of course, if you do want to modify the referenced object, then you need to use std::ref, but such usage now stands out, and makes it clear that this is the intent.

You do still need to be careful to ensure that you don't try and wrap references to temporaries, particularly when applying std::cref to the result of a function call, but such uses should stand out — I expect most uses to be wrapping a reference to a named variable rather than wrapping a function result.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: reference wrappers, ref, cref, cplusplus
Stumble It! | Submit to Reddit | Submit to DZone

Comment on this post

If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

More recent entries Older entries

About Us

Technical Writings

Subscribe to Blog

Blog Archives

Blog Archive

Friday, 29 October 2010

Thursday, 21 October 2010

To move or not to move

Exceptions and Destructors

Concurrency-related papers

Friday, 15 October 2010

Thursday, 14 October 2010

Wednesday, 06 October 2010

Wednesday, 08 September 2010

Concurrency-related papers

Other papers

Tuesday, 07 September 2010

Definition of Blocking

Definition of Non-blocking

Defintion of lock-free

Definition of wait-free

Choices

Monday, 09 August 2010

Tuesday, 27 July 2010

The code

The analysis

Second time around

Alternatives

Wednesday, 14 July 2010

Deducing the Referenced Type

Using std::cref to prevent modification

References to temporaries

Documentation

Recommendation

Using `std::cref` to prevent modification