Does const mean thread-safe?
Tuesday, 14 March 2017
There was a discussion recently on
the cpplang slack about whether const
meant thread-safe.
As with everything in life, it depends. In some ways, yes it does. In others, it does not. Read on for the details.
What do we mean by "thread-safe"?
If someone says something is "thread-safe", then it is important to define what that means. Here is an incomplete list of some things people might mean when they say something is "thread safe".
- Calling
foo(x)
on one thread andfoo(y)
on a second thread concurrently is OK. - Calling
foo(x_i)
on any number of threads concurrently is OK, provided eachx_i
is different. - Calling
foo(x)
on a specific number of threads concurrently is OK. - Calling
foo(x)
on any number of threads concurrently is OK. - Calling
foo(x)
on one thread andbar(x)
on another thread concurrently is OK. - Calling
foo(x)
on one thread andbar(x)
on any number of threads concurrently is OK.
Which one we mean in a given circumstance is important. For example, a concurrent queue might be a Single-Producer, Single-Consumer queue (SPSC), in which case it is safe for one thread to push a value while another pops a value, but if two threads try and push values then things go wrong. Alternatively, it might be a Multi-Producer, Single-Consumer queue (MPSC), which allows multiple threads to push values, but only one to pop values. Both of these are "thread safe", but the meaning is subtly different.
Before we look at what sort of thread safety we're after, let's just define what it means to be "OK".
Data races
At the basic level, when we say an operation is "OK" from a thread-safety point of view, we mean it has defined behaviour, and there are no data races, since data races lead to undefined behaviour.
From the C++ Standard perspective, a data race is where there are 2 operations that access the same memory location, such that neither happens-before the other, at least one of them modifies that memory location, and at least one of them is not an atomic operation.
An operation is thus "thread safe" with respect to the set of threads we wish to perform the operation concurrently if:
- none of the threads modify a memory location accessed by any of the other threads, or
- all accesses performed by the threads to memory locations which are modified by one or more of the threads are atomic, or
- the threads use suitable synchronization to ensure that there are happens-before operations between all modifications, and any other accesses to the modified memory locations.
So: what sort of thread-safety are we looking for from const
objects, and why?
Do as int
s do
A good rule of thumb for choosing behaviour for a class
in C++ is "do as
int
s do".
With regard to thread safety, int
s are simple:
- Any number of threads may read a given
int
concurrently - If any thread modifies a given
int
, no other threads may access thatint
for reading or writing concurrently.
This follows naturally from the definition of a data race, since int
s cannot
do anything special to provide synchronization or atomic operations.
If you have an int
, and more than one thread that wants to access it, if any
of those threads wants to modify it then you need external
synchronization. Typically you'll use a mutex for the external synchronization,
but other mechanisms can work too.
If your int
is const
, (or you have const int&
that references it, or
const int*
that points to it) then you can't modify it.
What does that mean for your class? In a well-designed class, the const
member
functions do not modify the state of the object. It is "logically" immutable
when accessed exclusively through const
member functions. On the other hand,
if you use a non-const
member function then you are potentially modifying the
state of the object. So far, so good: this is what int
s do with regard to
reading and modifying.
To do what int
s do with respect to thread safety, we need to ensure that it is
OK to call any const
member functions concurrently on any number of
threads. For many classes this is trivially achieved: if you don't modify the
internal representation of the object in any way, you're home dry.
Consider an employee
class that stores basic information about an employee,
such as their name, employee ID and so forth. The natural implementation of
const
member functions will just read the members, perform some simple
manipulation on the values, and return. Nothing is modified.
class employee{
std::string first_name;
std::string last_name;
// other data
public:
std::string get_full_name() const{
return last_name + ", " + first_name;
}
// other member functions
}
Provided that reading from a const std::string
and appending it to another
string is OK, employee::get_full_name
is OK to be called from any number of
threads concurrently.
You only have to do something special to "do as int
s do" if you modify the
internal state in your const
member function, e.g. to keep a tally of calls,
or cache calculation values, or similar things which modify the internal state
without modifying the externally-visible state. Of course, you would also need
to add some synchronization if you were modifying externally-visible state in
your const
member function, but we've already decided that's not a good plan.
In employee::get_full_name
, we're relying on the thread-safety of
std::string
to get us through. Is that OK? Can we rely on that?
Thread-safety in the C++ Standard Library
The C++ Standard Library itself sticks to the "do as int
s do" rule. This is
spelled out in the section on Data race
avoidance
(res.on.data.races). Specifically,
A C++ standard library function shall not directly or indirectly modify objects accessible by threads other than the current thread unless the objects are accessed directly or indirectly via the function's non-
const
arguments, includingthis
.
and
Implementations may share their own internal objects between threads if the objects are not visible to users and are protected against data races.
This means that if you have a const std::string&
referencing an object, then
any calls to member functions on it must not modify the object, and any
shared internal state must be protected against data races. The same applies if
it is passed to any other function in the C++ Standard Library.
However, if you have a std::string&
referencing the same object, then you must
ensure that all accesses to that object must be synchronized externally,
otherwise you may trigger a data race.
Our employee::get_full_name
function is thus as thread-safe as an int
:
concurrent reads are OK, but any modifications will require external
synchronization for all concurrent accesses.
There are two little words in the first paragraph quoted above which have a surprising consequence: "or indirectly".
Indirect Accesses
If you have two const std::vector<X>
s, vx
and vy
, then calling standard
library functions on those objects must not modify any objects accessible by
other threads, otherwise we've violated the requirements from the "data race
avoidance" section quoted above, since those objects would be "indirectly"
modified by the function.
This means that any operations on the X
objects within those containers that
are performed by the operations we do on the vectors must also refrain from
modifying any objects accessible by other threads. For example, the expression
vx==vy
compares each of the elements in turn. These comparisons must thus not
modify any objects accessible by other threads. Likewise,
std::for_each(vx.begin(),vx.end(),foo)
must not modify any objects accessible
by other threads.
This pretty much boils down to a requirement that if you use your class with the
C++ Standard Library, then const
operations on your class must be safe if
called from multiple threads. There is no such requirement for non-const
operations, or combinations of const
and non-const
operations.
You may of course decide that your class is going to allow concurrent
modifications (even if that is by using a mutex to restrict accesses
internally), but that is up to you. A class designed for concurrent access, such
as a concurrent queue, will need to have the internal synchronization; a value
class such as our employee
is unlikely to need it.
Summary
Do as int
s do: concurrent calls to const
member functions on your class must
be OK, but you are not required to ensure thread-safety if there are also
concurrent calls to non-const
member functions.
The C++ Standard Library sticks to this rule itself, and requires it of your
code. In most cases, it's trivial to ensure; it's only classes with complex
const
member functions that modify internal state that may need
synchronization added.
Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: cplusplus, const, concurrency
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Design and Content Copyright © 2005-2024 Just Software Solutions Ltd. All rights reserved. | Privacy Policy
3 Comments
We didn't actually intend to require that user classes used in STL types make const operations thread-safe; it's just the best wording we could come up with. The goal with the "or indirectly" wording was to cover the fact that vector<T> directly contains a couple pointers, but we wanted to require that the array those pointers *indirectly* refer to is also used in a do-as-int-does way.
It may not have been intended, but I'm not the first person to make this interpretation .... Herb's being making the case since 2012 in his "you don't know const and mutable" talk (https://herbsutter.com/2013/01/01/video-you-dont-know-const-and-mutable/)
Yeah, your interpretation of the words is pretty clearly what they say, even though we didn't intend it. If you have any idea of how to say it better, I'd love to get an LWG issue about it.
This interpretation isn't awful, but it's also stricter than any actual implementation: you can use a vector<ConstOperationsMutateWithoutLocking> fine within a single thread, and it'd be nice if the standard actually said that.