Blog Archive for / cplusplus /
Using atomics for thread synchronization in C++
Monday, 19 April 2021
In my previous blog post I wrote about spin locks, and how compilers must not move the locking loop above a prior unlock.
After thinking about this done more, I realised that is not something specific to locks — the same issue arises with any two step synchronization between threads.
Consider the following code
std::atomic<bool> ready1{false};
std::atomic<bool> ready2{false};
void thread1(){
ready1.store(true, std::memory_order_release);
while(!ready2.load(std::memory_order_acquire)){}
}
void thread2() {
while(!ready1.load(std::memory_order_acquire)) {}
ready2.store(true, std::memory_order_release);
}
thread1
sets ready1
to true
, then waits for thread2
to set ready2
to
true
. Meanwhile, thread2
waits for ready1
to be true
, then sets
ready2
to true
.
This is almost identical to the unlock/lock case from the previous
blog post, except the waiting thread is just using plain load
rather
than exchange
.
If the compiler moves the wait loop in thread1
above the store
then
both threads will hang forever. However it cannot do this for the same
reason the spinlocks can't deadlock in the previous post: the store
has to be visible to the other thread in a finite period if time, so
must be issued before the wait loop. https://eel.is/c++draft/intro.multithread#intro.progress-18
An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads in a finite period of time.
If the optimizer moved the store
across the loop in thread1
, then
it could not guarantee that the value became visible to the other
thread in a finite period of time. Therefore such an optimization is
forbidden.
Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: cplusplus, atomics, multithreading, synchronization
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Can non-overlapping spinlocks deadlock in C++?
Thursday, 15 April 2021
There has been discussion on Twitter recently about whether or not the C++
memory model allows spinlocks to deadlock if they just use
memory_order_acquire
in lock
and memory_order_release
in unlock
, due to
compiler optimizations. The case in question is where a thread locks one mutex,
unlocks it, and locks a second: can the compiler reorder the second lock above
the first unlock? If it does, and another thread does the same in the reverse
order, with the same optimization, then sequential locks could deadlock.
Here is the code in question, with all the lock/unlock code inlined.
std::atomic<bool> mutex1{false};
std::atomic<bool> mutex2{false};
int x=0;
int y=0;
void thread1(){
while(mutex1.exchange(true,std::memory_order_acquire)){} // #1
x=1;
mutex1.store(false,std::memory_order_release); // #2
while(mutex2.exchange(true,std::memory_order_acquire)){} // #3
y=1;
mutex2.store(false,std::memory_order_release); // #4
}
void thread2(){
while(mutex2.exchange(true,std::memory_order_acquire)){} // #5
x=2;
mutex2.store(false,std::memory_order_release); // #6
while(mutex1.exchange(true,std::memory_order_acquire)){} // #7
y=2;
mutex1.store(false,std::memory_order_release); // #8
}
For there to even be the possibility of deadlock, thread1
must successfully
execute line #1 before thread2
successfully executes line #7, and thread2
must
successfully execute line #5 before thread1
successfully executes line #3.
Because these are RMW operations, the threads must agree on the ordering.
The modification order of mutex1
must thus be #1(success), #2, #7(success), #8.
Similarly, the modification order of mutex2
must be #5(success), #6, #3(success), #4.
All threads must agree on these modification orders. https://eel.is/c++draft/intro.multithread#intro.races-4
From the point of view of thread1
, everything must run in program order:
compilers can only optimize things as long as they run "as if" in program order.
The store to mutex1
at #2 is guaranteed to be visible to thread2
in "a finite
period of time". https://eel.is/c++draft/intro.multithread#intro.progress-18
Consequently, thread2
must eventually see that store at line #7, even if it
executes line #7 a large number of times first.
Therefore, the compiler cannot move line #3 completely above line #2, since
doing so would not guarantee the visibility of #2 to other threads in a finite
period of time. It can move an arbitrary number of executions of line #3 above
line #2 (all of which will see that mutex2
is still true
), but not all the
executions of line #3.
Given that thread2
eventually sees the store from #2 at line #7, the exchange at
line #7 will eventually succeed, and thread2
will eventually complete.
Likewise, the store at #6 must become visible to thread1
in a finite period of
time. Therefore the exchange at line #3 will eventually see the value stored by
6, the exchange will succeed, and thread1
will complete, and the compiler is
not allowed to move all the executions of line #7 above #6.
No amount of compiler optimization is allowed to break this, so no: spinlocks cannot deadlock if they don't overlap.
Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: cplusplus, atomics, multithreading, spinlocks
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Ticket Maps
Saturday, 20 March 2021
It has been an increasingly common scenario that I've encountered where you have some ID that's monotonically increasing, such as a subscription or connection index, or user ID, and you need your C++ program to hold some data that's associated with that ID value. The program can then pass round the ID, and use that ID to access the associated data at a later point.
Over time, IDs can become invalidated, so the data associated with
that value is removed, and you end up with a sparse set of
currently-active IDs. You would therefore naturally lean towards using
a map (whether a std::map
, std::unordered_map
or some other custom
map) to associate the data with the ID.
Often such maps are implemented as node-based containers, which means that the nodes can be allocated at disparate memory addresses, which is bad for cache performance. Adding and removing nodes also always requires memory allocation/deallocation.
In his "Better Code: Relationships" presentation, Sean Parent describes an alternative implementation, which he calls the "Russian Coat-Check Algorithm". In this algorithm, the map is implemented as a vector of pairs of key/optional data. Because the keys come from a monotonically increasing index, the vector is always sorted, and inserts are always at the end. Entries can be removed by clearing the data, and if there are too many empty entries then the vector can be compacted. Lookups are always fast, because the vector is always sorted, so a simple binary search will find the right element.
Inspired by watching Sean's presentation at ACCU 2021 last week, I implemented what I call a Ticket Map (it maps a Ticket to a Value). This is an implementation of the algorithm Sean described, fleshed out to a full container. When you insert a value, it is assigned the next available ticket value. You can later access or erase the value using that ticket.
#include <string>
#include <iostream>
#include "ticket_map.hpp"
int main(){
jss::ticket_map<int,std::string> map;
auto ticket1=map.insert("hello");
auto ticket2=map.insert("world");
std::cout<<map[ticket1]<<" "<<map[ticket2]<<std::endl;
map.erase(ticket1);
}
You can of course iterate through the container: in this case the iterators are
Input Iterators, where the value_type
is a std::pair<Ticket const&,Value&>
. This allows you to access both the tickets and the raw
elements, but also allows the iterator to provide a nice view over the data
without exposing the std::optional
implementation detail.
#include <string>
#include <iostream>
#include "ticket_map.hpp"
int main(){
jss::ticket_map<int,std::string> map;
auto ticket1=map.insert("hello");
auto ticket2=map.insert("world");
auto ticket3=map.insert("goodbye");
for(auto& [ticket,value]: map){
std::cout<<ticket<<": "<<value<<std::endl;
}
}
The code is available on GitHub under the Boost Software License.
Enjoy!
Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: cplusplus, maps, containers
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Invariants and Preconditions
Thursday, 05 March 2020
I tend to think about invariants and preconditions a lot. Pretty much every class has invariants, and most functions have preconditions. I don't think they are complicated concepts, but somehow they seem to confuse people anyway, so I decided it was time to write down some of my thoughts.
Invariants
A class invariant is something that is always true for every instance of that class. Users of objects of that class never see an object for which the invariant is not true; it is true for all states of all objects of that class at all times. This is what makes it invariant.
In order to perform the required operations, class member functions may temporarily break invariants, and then restore them afterwards. Unless you allow concurrent invocations of member functions on the same object, or deliberately pass the object to another function from inside a member function, this is OK, as the outside world will not be able to interact with the object when it is in such a state. It does mean you need to be especially careful when calling other functions from inside a member function, to ensure that the invariants still hold.
Invariants are important. If you writing a function that operates on an object, the only thing that is guaranteed to be true about it is that the class invariants hold, unless you choose to impose additional preconditions.
Preconditions
Preconditions are those that must be true when a function is called. Every
operation on a class has the implicit precondition that the class invariants
hold, but operations can add additional preconditions. For example, you always
call v.empty()
to check if a vector is empty, but you only call v.front()
to
retrieve the first element if the vector is not empty. v[i]
and v.at(i)
differ only in their preconditions; v[i]
requires the vector to have more than
i
elements, whereas v.at(i)
is specified to work for any vector, throwing if
i
is out of range.
Some people like to write classes with two-stage construction — first you
default-construct the object, then you call x.init()
or similar to "finish
off" the initialization, and you can't call any other function on that object
until after the initialization is complete. Though I don't like this pattern it
is valid code; every member function has a precondition that x.init()
has been
called.
Some people like to say that such a class can have "invariants" that do not hold until after initialization is complete. I think this is a misuse of the term; invariants must always hold for every object of a class, outside the internals of its member functions. As I just stated above, what such a class really has is a precondition on every function, not an invariant.
If I write a function
void do_stuff(X& x){
// my code here
}
then I can rely on the invariants of X
holding, but cannot rely on anything
else unless I add preconditions to do_stuff
.
Likewise, it should not be possible to do anything to an object that breaks its invariants. If you can then either you gave a bug, or they are not really invariants, just preconditions for most operations.
Some people like to write classes that have operations to transfer the internals from one object to another, leaving the source in a special "emptier than empty" state. This can be good for efficiency, especially when doing otherwise would require allocating resources to return the source to the plain "empty" state, and it is likely that the source will be destroyed immediately anyway.
Again, this is a perfectly reasonable thing to do, particularly when the resources are expensive to allocate and performance of the code is important. The problem comes when people then want to say that the class invariants don't hold for this state. Just as with the pre-initialization state above, I think this is a misuse of the term; they are not invariants, it is just that most operations have a precondition that the object is not "emptier than empty".
Move semantics
Since C++11, C++ has had the concept of "moving" objects, transferring resources
from the source to the destination. This is an important facility for
expressiveness of code and efficiency. Objects that own resources such as
std::unique_ptr
, std::thread
or std::fstream
can now be moved,
transferring ownership of the resource from the source to the destination. This
allows such objects to be put in containers, and transferred between scopes,
which can greatly simplify code. Likewise, containers like std::vector
can be
moved, transferring the entire set of contained objects from one vector to
another without reallocation. These facilities are of great importance for
writing clean, efficient code.
One thing all these operations have in common is that after a move, the class invariants still hold for the source object. This should go without saying: the source is still an object of its type, so of course the invariants hold.
The issue is with things like std::list
, where some implementations allocate a
sentinel node for the head/tail of the list in the default constructor, which is
thus transferred during move operations, so the move constructor has to allocate
a new sentinel node for the source, in order to ensure the class invariants
still hold. A similar thing occurs with anything that uses the pimpl idiom: if
the implementation pointer is transferred then either the class invariants must
allow for there to be a null implementation pointer, or a new implementation
object must be allocated.
Some people therefore argue that move operations should allow the source to be left with broken invariants, as a hollow shell of an object, in an "emptier than empty" state. I think this is misuse of the term "invariants". By all means, steal the internals and leave the source with a null pointer to the internals, or whatever. However, you now need to change the invariants of your class to allow this, or they are not invariants, because they would no longer apply to all objects of that class. There is no need to update any of the functions on your class to handle this new state, but if you do not do so, then any functions that don't work for objects in this "emptier than empty" state now have an additional precondition that the object is not in that state.
This is all perfectly fine, objects with such states are plentiful, and have
legitimate reasons for existing, but it is important to accept that their
functions now have preconditions. My do_stuff
function above can only call
member functions that are safe to use in such a state unless it too has a
precondition that the object x
is not in such a state.
From a user perspective, it would be nice to be able to query such a state, so I
could know what operations are permitted. For example, std::future
provides a
member function valid
so you can call f.valid()
before trying to do anything
with it, and std::thread
provides the joinable
member function, which can be
used to see if the object holds a thread handle. My do_stuff
function can then
call x.is_emptier_than_empty()
in order to check for the special state and
take appropriate action. That said, this is a "nice to have": the absence of
such a function doesn't mean that the state can't exist, just that it's
potentially harder to deal with.
Interaction with other code
If you pass an object to a function or class, then you need to know what that
function requires of your object and ensure that those expectations are met. If
the function expects your object to be a container with at least one element and
you pass an empty container, then you've broken the expectations and your code
has a bug. Likewise, if the function expects to be able to call x.foo()
on
your object, but foo cannot be called on an "emptier than empty" object, and you
passed such an object to the function, your code has a bug.
The difficulty comes when such a state arises as a consequence of other
actions. If x.foo()
can leave x
in an "emptier than empty" state, then
do_stuff
had better be prepared to handle the scenario, otherwise there is a
bug somewhere. Where the bug is depends on the documentation: if do_stuff
requires that you can always perform certain actions on the object it is passed
as a parameter, and those actions require that the object is not "emptier than
empty", then the bug is in the caller, or maybe the implementation of class
X
. If there is no such documentation, the bug is in do_stuff
.
Note that requiring that x.foo()
can only be called with objects that are not
"emptier than empty" is a precondition. It should be perfectly acceptable for
do_stuff
to call any functions on x
that do not have preconditions, even if
x
ends up in an "emptier than empty" state.
This is the exactly the same scenario we get with other code. For example if you
pass a vector to a function that requires that the vector isn't empty, the
function can erase the last element without a problem. If it wishes to erase the
last element again, thus removing two elements in total, then it needs to
check for and handle the case that the first erasure left the vector
empty. Calling v.clear()
or v.empty()
or v.push_back(y)
would be fine
without checking, as those functions have no preconditions.
If x.foo()
is instead spelled y=std::move(x)
, then nothing changes: it is
perfectly fine for x
to end up in an "emptier than empty" state if do_stuff
knows how to handle such a state, or doesn't care, because it doesn't touch x
again.
One option is for do_stuff
to say that after a move-assignment like this, it
can't rely on x
having any particular value, but it is still an instance of
class X
, so its invariants must hold, and therefore any operation without a
precondition can be used. It can therefore call x.is_emptier_than_empty()
and
behave appropriately.
The other option is for do_stuff
to place stronger requirements on x
, such
as requiring that it does not end up "emptier than empty" after a move, or even
requiring that it has a specific state.
Valid but unspecified states
The standard library has chosen option 1: after move, objects must be valid - i.e. still actually be objects of the expected type, and not destroyed - but they have an unspecified state, so the function cannot rely on any specific value, and can therefore only perform operations without preconditions, until it has verified that the preconditions for other operations hold.
This holds both for standard library types such as std::string
or
std::vector<int>
, but also for user defined types that you use with the
standard library. If you write
std::string s="hello";
std::string s2=std::move(s);
then s
was the source of a move operation, and thus is in a valid, but
unspecified state. For implementations that use the Small String Optimization,
such that short strings are stored entirely within the string object itself,
without requiring dynamic allocation, this might mean that s
still has the
value hello
after the move, because then it doesn't have to store an empty
string value in s
as well as copying the contents to s2
. It is also possible
that the implementation might clear s
, for consistency with longer
strings. Both are valid choices, as in either case s
is still a std::string
object, and the exact details of the state are unspecified.
Likewise, if you write
std::vector<MyWidget> v=create_some_widgets();
v.insert(v.begin(),MyWidget());
then that call to v.insert
is going to have to move the widgets around in
order to make room at the beginning for the extra one. The library requires that
moving widgets leaves them in a valid, but unspecified, state. In this case,
that means that having moved a widget from one position to another, it is OK to
move another widget on top of the first one, as that is the only operation the
vector will perform anyway. If you pass your object to a standard library
function that does something other than move things around (such as std::sort
or std::remove_if
), then you need to check that the other operations the
function might do can still be done on a moved-from object. By calling the
library function you are stating that your objects meet (and will continue to
meet) any preconditions you have imposed on the operations that the library
function specification says it might perform.
Invariants and Concurrency
Right back at the beginning of this article I stated that "Users of objects of that class never see an object for which the invariant is not true", but also that "class member functions may temporarily break invariants, and then restore them afterwards". These two things don't work together very well if an object can be accessed concurrently from multiple threads, and this is one aspect that makes writing concurrent code hard, especially if you try to use fine-grained locks or atomic operations.
Consider a simple class Names
which holds two vectors, one for first names and
one for last names:
class Names{
std::vector<std::string> firstNames,lastNames;
};
I want the invariant of this class to be that the number of elements in
firstNames
and lastNames
is always the same, so that I can add operations to
this class knowing that is always true. There is a strong argument that this
ought to be a std::vector<std::pair<std::string,std::string>>
rather than two
vectors, but assume that the class author has a legitimate reason for the
separate vectors.
At first glance, a member function to add an entry is quite simple:
void Names::addEntry(std::string const& first,std::string const& last){
firstNames.push_back(first);
lastNames.push_back(last);
}
However, even for the single-threaded case, this isn't good: if
lastNames.push_back(last)
throws an exception, then our invariant is broken,
as we successfully added an element to firstNames
but not to lastNames
. We
therefore need to handle that case:
void Names::addEntry(std::string const& first,std::string const& last){
firstNames.push_back(first);
try{
lastNames.push_back(last);
} catch(...){
firstNames.resize(lastNames.size());
throw;
}
}
Now, if lastNames.push_back(last)
throws, then std::vector
guarantees that
lastNames
is unchanged, so we can ensure our invariant holds again by
shrinking firstNames
back to the same size, and the single-threaded case is
now sorted. If our Names
object is only ever accessed by a single thread, then
the invariant holds at all times outside the internals of a member function.
What about the concurrent case? If we call Names::addEntry
from multiple
threads, then everything is broken: std::vector
is not safe for concurrent
access from multiple threads, so we have data races and undefined
behaviour. Using a ThreadSafeVector
class instead which provides the
operations we need but is safe for concurrent access removes these data races
and the undefined behaviour, but doesn't fix the overall problem. Thread
safety is not composable. In this case, the invariants are still broken
during the call to Names::addEntry
, so a concurrent call will see an object
with broken invariants: the thread safety of the two vectors doesn't matter if
the second thread can see firstNames
and lastNames
with different sizes.
We can fix the problem by using a mutex: the mutex lock prevents the second
thread from accessing the internals of the object until the first thread has
finished, so the second thread cannot see the state with a broken invariant. It
also allows us to revert back to std::vector
rather than ThreadSafeVector
since the individual vectors are only ever accessed by one thread at a time:
class Names{
std::mutex mutex;
std::vector<std::string> firstNames,lastNames;
public:
void addEntry(std::string const& first,std::string const& last){
std::lock_guard guard(mutex);
firstNames.push_back(first);
try{
lastNames.push_back(last);
} catch(...){
firstNames.resize(lastNames.size());
throw;
}
}
};
This is one of the big reasons why lock-free data structures are so hard to write: we cannot just stick a mutex lock on the outside to ensure other threads never see broken invariants. Instead, we must ensure that the invariants hold at all times, even during the operation of member functions. For this reason, lock-free data structures are often designed so that as much as possible is done off to the side, without modifying the core data structures, and then either the entire change is committed with a single atomic operation, or care is taken to ensure that the invariants still hold after each incremental step.
End note
Invariants are important. They are vital to working with objects, and without them it is much harder to write correct code. However, you have to choose your invariants carefully: invariants that make the rest of your code easier to reason about can have a cost, so you have to be aware of the trade-offs. Like everything in programming it is up to you what trade-off you choose: simplicitly vs performance is a common choice, but it can also be a choice of which operations are fast and which are slow. Whatever you choose, there will be cases where the other choice was more appropriate.
Whatever you choose, you must be honest with yourself, and the users of your class. Don't claim something is an invariant just because it holds most of the time; it must hold all of the time. Obviously, you can make it so a precondition of a function that objects it operates on are only in certain states, and then write your program logic to ensure that the conditions are met, but don't confuse the two.
Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: cplusplus, invariants, preconditions
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
The Power of Hidden Friends in C++
Tuesday, 25 June 2019
"Friendship" in C++ is commonly thought of as a means of allowing non-member functions and other classes to access the private data of a class. This might be done to allow symmetric conversions on non-member comparison operators, or allow a factory class exclusive access to the constructor of a class, or any number of things.
However, this is not the only use of friendship in C++, as there is an additional property to declaring a function or function template a friend: the friend function is now available to be found via Argument-Dependent Lookup (ADL). This is what makes operator overloading work with classes in different namespaces.
Argument Dependent Lookup at Work
Consider the following code snippet:
namespace A{
class X{
public:
X(int i):data(i){}
private:
int data;
friend bool operator==(X const& lhs,X const& rhs){
return lhs.data==rhs.data;
}
};
}
int main(){
A::X a(42),b(43);
if(a==b) do_stuff();
}
This code snippet works as you might expect: the compiler looks for an
implementation of operator==
that works for A::X
objects, and there isn't
one in the global namespace, so it also looks in the namespace where X
came
from (A
), and finds the operator defined as a friend of class X
. Everything
is fine. This is ADL at work: the argument to the operator is an A::X
object, so the namespace that it comes from (A
) is searched as well as the
namespace where the usage is.
Note, however, that the comparison operator is not declared anywhere other
than the friend
declaration. This means that it is only considered for name
lookup when one of the arguments is an X
object (and thus is "hidden" from
normal name lookup). To demonstrate this, let's define an additional class in
namespace A
, which is convertible to 'X':
namespace A{
class Y{
public:
operator X() const{
return X(data);
}
Y(int i):data(i){}
private:
int data;
};
}
A::Y y(99);
A::X converted=y; // OK
Our Y
class has a conversion operator defined, so we can convert it to an X
object at will, and it is also in namespace A
. You might think that we can compare Y
objects, because our comparison operator takes an X
, and Y
is convertible to
X
. If you did, you'd be wrong: the comparison operator is only visible to name
lookup if one of the arguments is an X
object.
int main(){
A::Y a(1),b(2);
if(a==b) // ERROR: no available comparison operator
do_stuff();
}
If we convert one of the arguments to an X
then it works, because the
comparison operator is now visible, and the other argument is converted to an
X
to match the function signature:
int main(){
A::Y a(1),b(2);
if(A::X(a)==b) // OK
do_stuff();
}
Similarly, if we declare the comparison operator at namespace scope, everything works too:
namespace A{
bool operator==(X const& lhs,X const& rhs);
}
int main(){
A::Y a(1),b(2);
if(a==b) // OK now
do_stuff();
}
In this case, the arguments are of type Y
, so namespace A
is searched, which
now includes the declaration of the comparison operator, so it is found, and the
arguments are converted to X
objects to do the comparison.
If we omit this namespace scope definition, as in the original example, then this function is a hidden friend.
This isn't just limited to operators: normal functions can be defined in
friend
declarations too, and just as with the comparison operator above, if
they are not also declared at namespace scope then they are hidden from normal
name lookup. For example:
struct X{
X(int){}
friend void foo(X){};
};
int main(){
X x(42);
foo(x); // OK, calls foo defined in friend declaration
foo(99); // Error: foo not found, as int is not X
::foo(x); // Error: foo not found as ADL not triggered
}
Benefits of Hidden Friends
The first benefit of hidden friends is that it avoids accidental implicit
conversions. In our example above, comparing Y
objects doesn't implicitly
convert them to X
objects to use the X
comparison unless you explicitly do
something to trigger that behaviour. This can avoid accidental uses of the wrong
function too: if I have a function wibble
that takes an X
and wobble
that
takes a Y
, then a typo in the function name won't trigger the implicit
conversion to X
:
class X{
friend void wibble(X const&){}
};
class Y{
friend void wobble(Y const&){}
public:
operator X() const;
};
int main(){
Y y;
wibble(y); // Error no function wibble(Y)
}
This also helps spot errors where the typo was on the definition: we meant to
define wibble(Y)
but misspelled it. With "normal" declarations, the call to
wibble(y)
would silently call wibble(X(y))
instead, leading to unexpected
behaviour. Hopefully this would be caught by tests, but it might make it harder
to identify the problem as you'd be checking the definition of wobble
,
wondering why it didn't work.
Another consequence is that it makes it easier for the compiler: the hidden
friends are only checked when there is a relevant argument provided. This means
that there are fewer functions to consider for overload resolution, which makes
compilation quicker. This is especially important for operators: if you have a
large codebase, you might have thousands of classes with operator==
defined. If they are declared at namespace scope, then every use of ==
might
have to check a large number of them and perform overload resolution. If they
are hidden friends, then they are ignored unless one of the expressions being
compared is already of the right type.
In order to truly understand the benefits and use them correctly, we need to know when hidden friends are visible.
Rules for Visibility of Hidden Friends
Firstly, hidden friends must be functions or function templates; callable objects don't count.
Secondly, the call site must use an unqualified name — if you use a qualified name, then that checks only the specified scope, and disregards ADL (which we need to find hidden friends).
Thirdly, normal unqualified lookup must not find anything that isn't a
function or function template. If you have a local variable int foo;
, and try
to call foo(my_object)
from the same scope, then the compiler will rightly
complain that this is invalid, even if the type of my_object
has a hidden
friend named foo
.
Finally, one of the arguments to the function call must be of a user-defined type, or a pointer or reference to that type.
We now have the circumstances for calling a hidden friend if there is one:
my_object x;
my_object* px=&x;
foo(x);
foo(px);
Both calls to foo
in this code will trigger ADL, and search for hidden
friends.
ADL searches a set of namespaces that depend on the type of my_object
, but
that doesn't really matter for now, as you could get to normal definitions of
foo
in those namespaces by using appropriate qualification. Consider this code:
std::string x,y;
swap(x,y);
ADL will find std::swap
, since std::string
is in the std
namespace, but we
could just as well have spelled out std::swap
in the first place. Though this
is certainly useful, it isn't what we're looking at right now.
The hidden friend part of ADL is that for every argument to the function call, the compiler builds a set of classes to search for hidden friend declarations. This lookup list is built as follows from a source type list, which is initially the types of the arguments supplied to the function call.
Our lookup list starts empty. For each type in the source type list:
- If the type being considered is a pointer or reference, add the pointed-to or referenced type to the source type list
- Otherwise, if the type being considered is a built-in type, do nothing
- Otherwise, if the type is a class type then add it to the lookup list, and
check the following:
- If the type has any direct or indirect base classes, add them to the lookup list
- If the type is a member of a class, add the containing class to the lookup list
- If the type is a specialization of a class template, then:
- add the types of any template type arguments (not non-type arguments or template template arguments) to the source type list
- if any of the template parameters are template template parameters, and the supplied arguments are member templates, then add the classes of which those templates are members to the lookup list
- Otherwise, if the type is an enumerated type that is a member of a class, add that class to the lookup list
- Otherwise, if the type is a function type, add the types of the function return value and function parameters to the source type list
- Otherwise, if the type is a pointer to a member of some class
X
, add the classX
and the type of the member to the source type list
This gets us a final lookup list which may be empty (e.g. in foo(42)
), or
may contain a number of classes. All the classes in that lookup list are now
searched for hidden friends. Normal overload resolution is used to determine
which function call is the best match amongst all the found hidden friends, and
all the "normal" namespace-scope functions.
This means that you can add free functions and operators that work on a user-defined type by adding normal namespace-scope functions, or by adding hidden friends to any of the classes in the lookup list for that type.
Adding hidden friends via base classes
In a recent blog post, I mentioned
my
strong_typedef
implementation. The
initial design for that used an enum class
to specify the permitted
operations, but this was rather restrictive, so after talking with some others
(notably Peter Sommerlad) about alternative implementation strategies, I
switched it to a mixin-based implementation. In this case, the Properties
argument is now a variadic parameter pack, which specifies types that provide
mixin classes for the typedef. jss::strong_typedef<Tag,Underlying,Prop>
then derives from
Prop::mixin<jss::strong_typedef<Tag,Underlying,Prop>,Underlying>
. This means
that the class template Prop::mixin
can provide hidden friends that operate on
the typedef type, but are not considered for "normal" lookup. Consider, for
example, the implementation of
jss::strong_typedef_properties::post_incrementable
:
struct post_incrementable {
template <typename Derived, typename ValueType> struct mixin {
friend Derived operator++(Derived &self, int) noexcept(
noexcept(std::declval<ValueType &>()++)) {
return Derived{self.underlying_value()++};
}
};
};
This provides an implementation of operator++
which operates on the strong
typedef type Derived
, but is only visible as a hidden friend, so if you do
x++
, and x
is not a strong typedef that specifies it is post_incrementable
then this operator is not considered, and you don't get accidental conversions.
This makes the strong typedef system easily extensible: you can add new property types that define mixin templates to provide both member functions and free functions that operate on the typedef, without making these functions generally visible at namespace scope.
Hidden Friends and Enumerations
I had forgotten that enumerated types declared inside a class also triggered searching that class for hidden friends until I was trying to solve a problem for a client recently. We had some enumerated types that were being used for a particular purpose, which we therefore wanted to enable operations on that wouldn't be enabled for "normal" enumerated types.
One option was to specialize a global template as I described in my article
on
Using Enum Classes as Bitfields,
but this makes it inconvenient to deal with enumerated types that are members of
a class (especially if they are private
members), and impossible to deal with
enumerated types that are declared at local scope. We also wanted to be able to
declare these enums with a macro, which would mean we couldn't use the
specialization as you can only declare specializations in the namespace in which
the original template is declared, and the macro wouldn't know how to switch
namespaces, and wouldn't be usable at class scope.
This is where hidden friends came to the rescue. You can define a class anywhere you can define an enumerated type, and hidden friends declared in the enclosing class of an enumerated type are considered when calling functions that take the enumerated as a parameter. We could therefore declare our enumerated types with a wrapper class, like so:
struct my_enum_wrapper{
enum class my_enum{
// enumerations
};
};
using my_enum=my_enum_wrapper::my_enum;
The using
declaration means that other code can just use my_enum
directly
without having to know or care about my_enum_wrapper
.
Now we can add our special functions, starting with a function to verify this is one of our special enums:
namespace xyz{
constexpr bool is_special_enum(void*) noexcept{
return false;
}
template<typename T>
constexpr bool is_special_enum() noexcept{
return is_special_enum((T*)nullptr);
}
}
Now we can say xyz::is_special_enum<T>()
to check if something is one of our
special enumerated types. By default this will call the void*
overload, and
thus return false
. However, the internal call passes a pointer-to-T
as the
argument, which invokes ADL, and searches hidden friends. We can therefore add a
friend
declaration to our wrapper class which will be found by ADL:
struct my_enum_wrapper{
enum class my_enum{
// enumerations
};
constexpr bool is_special_enum(my_enum*) noexcept
{
return true;
}
};
using my_enum=my_enum_wrapper::my_enum;
Now, xyz::is_special_enum<my_enum>()
will return true
. Since this is a
constexpr
function, it can be used in a constant expression, so can be used
with std::enable_if
to permit operations only for our special enumerated
types, or as a template parameter to specialize a template just for our
enumerated types. Of course, some additional operations can also be added as
hidden friends in the wrapper class.
Our wrapper macro now looks like this:
#define DECLARE_SPECIAL_ENUM(enum_name,underlying_type,...)\
struct enum_name##_wrapper{\
enum class enum_name: underlying_type{\
__VA_ARGS__\
};\
constexpr bool is_special_enum(enum_name*) noexcept\
{\
return true;\
}\
};\
using enum_name=enum_name##_wrapper::enum_name;
so you can declare a special enum as
DECLARE_SPECIAL_ENUM(my_enum,int,a,b,c=42,d)
. This works at namespace scope,
as a class member, and at local scope, all due to the hidden friend.
Summary
Hidden Friends are a great way to add operations to a specific type without permitting accidental implicit conversions, or slowing down the compiler by introducing overloads that it has to consider in other contexts. They also allow declaring operations on types in contexts that otherwise you wouldn't be able to do so. Every C++ programmer should know how to use them, so they can be used where appropriate.
Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: cplusplus, friends
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
strong_typedef - Create distinct types for distinct purposes
Wednesday, 29 May 2019
One common problem in C++ code is the use of simple types for many things: a
std::string
might be a filename, a person's name, a SQL query string or a
piece of JSON; an int
could be a count, an index, an ID number, or even a file
handle. In his 1999 book "Refactoring" (which has a
second edition as of January 2019), Martin Fowler called this phenomenon
"Primitive Obsession", and recommended that we use dedicated classes for each
purpose rather than built-in or library types.
The difficulty with doing so is that built-in types and library types have predefined sets of operations that can be done with them from simple operations like incrementing/decrementing and comparing, to more complex ones such as replacing substrings. Creating a new class each time means that we have to write implementations for all these functions every time. This duplication of effort raises the barrier to doing this, and means that we often decide that it isn't worthwhile.
However, by sticking to the built-in and library types, we can end up in a scenario where a function takes multiple parameters of the same type, with distinct meanings, and no clear reason for any specific ordering. In such a scenario, it is easy to get the parameters in the wrong order and not notice until something breaks. By wrapping the primitive type in a unique type for each usage we can eliminate this class of problem.
My strong_typedef
class
template aims to make this easier. It wraps an existing type, and associates it
with a tag type to define the purpose, and which can therefore be used to make
it unique. Crucially, it then allows you to specify which sets of operations you
want to enable: it might not make sense to add ID numbers, but it might make
perfect sense to add counters, even if both are represented by integers. You
might therefore using jss::strong_typedef<struct IdTag,unsigned,jss::strong_typedef_properties::equality_comparable>
for an ID
number, but jss::strong_typedef<struct IndexTag,unsigned,jss::strong_typedef_properties::comparable | jss::strong_typedef_properties::incrementable | jss::strong_typedef_properties::decrementable>
for an index type.
I've implemented something similar to this class for various clients over the years, so I decided it was about time to make it publicly available. The implementation on github condenses all of the solutions to this problem that I've written over the years to provide a generic implementation.
Basic Usage
jss::strong_typedef
takes three template parameters: Tag
, ValueType
and Properties
.
The first (Tag
) is a tag type. This is not used for anything other than to
make the type unique, and can be incomplete. Most commonly, this is a class or
struct declared directly in the template parameter, and nowhere else, as in the
examples struct IdTag
and struct IndexTag
above.
The second (ValueType
) is the underlying type of the strong typedef. This
is the basic type that you would otherwise be using.
The third (Properties
) is an optional parameter that specifies the operations
you wish the strong typedef to support. By default it is
jss::strong_typedef_properties::none
— no operations are supported. See
below for a full list.
Declaring Types
You create a typedef by specifying these parameters:
using type1=jss::strong_typedef<struct type1_tag,int>;
using type2=jss::strong_typedef<struct type2_tag,int>;
using type3=jss::strong_typedef<struct type3_tag,std::string,
jss::strong_typedef_properties::comparable>;
type1
, type2
and type3
are now separate types. They cannot be implicitly converted
to or from each other or anything else.
Creating Values
If the underlying type is default-constructible, then so is the new type. You can also construct the objects from an object of the wrapped type:
type1 t1;
type2 t2(42);
// type2 e2(t1); // error, type1 cannot be converted to type2
Accessing the Value
strong_typedef
can wrap built-in or class type, but that's only useful if you
can access the value. There are two ways to access the value:
- Cast to the stored type:
static_cast<unsigned>(my_channel_index)
- Use the
underlying_value
member function:my_channel_index.underlying_value()
Using the underlying_value
member function returns a reference to the stored
value, which can thus be used to modify non-const
values, or to call member
functions on the stored value without taking a copy. This makes it particularly
useful for class types such as std::string
.
using transaction_id=jss::strong_typedef<struct transaction_tag,std::string>;
bool is_a_foo(transaction_id id){
auto& s=id.underlying_value();
return s.find("foo")!=s.end();
}
Other Operations
Depending on the properties you've assigned to your type you may
be able to do other operations on that type, such as compare a == b
or
a < b
, increment with ++a
, or add two values with a + b
. You might also be
able to hash the values with std::hash<my_typedef>
, or write them to a
std::ostream
with os << a
. Only the behaviours enabled by the Properties
template parameter will be available on any given type. For anything else, you
need to extract the wrapped value and use that.
Examples
IDs
An ID of some description might essentially be a number, but it makes no sense
to perform much in the way of operations on it. You probably want to be able to
compare IDs, possibly with an ordering so you can use them as keys in a
std::map
, or with hashing so you can use them as keys in std::unordered_map
,
and maybe you want to be able to write them to a stream. Such an ID type might
be declared as follows:
using widget_id=jss::strong_typedef<struct widget_id_tag,unsigned long long,
jss::strong_typedef_properties::comparable |
jss::strong_typedef_properties::hashable |
jss::strong_typedef_properties::streamable>;
using froob_id=jss::strong_typedef<struct froob_id_tag,unsigned long long,
jss::strong_typedef_properties::comparable |
jss::strong_typedef_properties::hashable |
jss::strong_typedef_properties::streamable>;
Note that froob_id
and widget_id
are now different types due to the
different tags used, even though they are both based on unsigned long long
. Therefore any attempt to use a widget_id
as a froob_id
or vice-versa
will lead to a compiler error. It also means you can overload on them:
void do_stuff(widget_id my_widget);
void do_stuff(froob_id my_froob);
widget_id some_widget(421982);
do_stuff(some_widget);
Alternatively, an ID might be a string, such as a purchase order number of transaction ID:
using transaction_id=jss::strong_typedef<struct transaction_id_tag,std::string,
jss::strong_typedef_properties::comparable |
jss::strong_typedef_properties::hashable |
jss::strong_typedef_properties::streamable>;
transaction_id some_transaction("GBA283-HT9X");
That works too, since strong_typedef
can wrap any built-in or class type.
Indexes
Suppose you have a device that supports a number of channels, so you want to be
able to retrieve the data for a given channel. Each channel yields a number of
data items, so you also want to access the data items by index. You could use
strong_typedef
to wrap the channel index and the data item index, so they
can't be confused. You can also make the index types incrementable
and
decrementable
so they can be used in a for
loop:
using channel_index=jss::strong_typedef<struct channel_index_tag,unsigned,
jss::strong_typedef_properties::comparable |
jss::strong_typedef_properties::incrementable |
jss::strong_typedef_properties::decrementable>;
using data_index=jss::strong_typedef<struct data_index_tag,unsigned,
jss::strong_typedef_properties::comparable |
jss::strong_typedef_properties::incrementable |
jss::strong_typedef_properties::decrementable>;
Data get_data_item(channel_index channel,data_index item);
data_index get_num_items(channel_index channel);
void process_data(Data data);
void foo(){
channel_index const num_channels(99);
for(channel_index channel(0);channel<num_channels;++channel){
data_index const num_data_items(get_num_items(channel));
for(data_index item(0);item<num_data_items;++item){
process_data(get_data_item(channel,item));
}
}
}
The compiler will complain if you pass the wrong parameters, or compare the
channel
against the item
.
Behaviour Properties
The Properties
parameter specifies behavioural properties for the new type. It
must be one of the values of jss::strong_typedef_properties
, or a value obtained by or-ing them
together (e.g. jss::strong_typedef_properties::hashable | jss::strong_typedef_properties::streamable | jss::strong_typedef_properties::comparable
). Each
property adds some behaviour. The available properties are:
equality_comparable
=> Can be compared for equality (st==st2
) and inequality (st!=st2
)pre_incrementable
=> Supports preincrement (++st
)post_incrementable
=> Supports postincrement (st++
)pre_decrementable
=> Supports predecrement (--st
)post_decrementable
=> Supports postdecrement (st--
)addable
=> Supports addition (st+value
,value+st
,st+st2
) where the result is convertible to the underlying type. The result is a new instance of the strong typedef.subtractable
=> Supports subtraction (st-value
,value-st
,st-st2
) where the result is convertible to the underlying type. The result is a new instance of the strong typedef.ordered
=> Supports ordering comparisons (st<st2
,st>st2
,st<=st2
,st>=st2
)mixed_ordered
=> Supports ordering comparisons where only one of the values is a strong typedefhashable
=> Supports hashing withstd::hash
streamable
=> Can be written to astd::ostream
withoperator<<
incrementable
=>pre_incrementable | post_incrementable
decrementable
=>pre_decrementable | post_decrementable
comparable
=>ordered | equality_comparable
Guideline and Implementation
I strongly recommend using strong_typedef
or an equivalent implementation
anywhere you would otherwise reach for a built-in or library type such as int
or std::string
when designing an interface.
My strong_typedef
implementation is available on github under the Boost
Software License.
Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: cplusplus, memory, safety
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Get the element index when iterating with an indexed_view
Monday, 25 March 2019
One crucial difference between using an index-based for
loop and a range-based for
loop is that
the former allows you to use the index for something other than just identifying the element,
whereas the latter does not provide you with access to the index at all.
The difference between index-based for
loops and range-based for
loops means that some people
are unable to use simple range-based for
loops in some cases, because they need the index.
For example, you might be initializing a set of worker threads in a thread pool, and each thread needs to know it's own index:
std::vector<std::thread> workers;
void setup_workers(unsigned num_threads){
workers.resize(num_threads);
for(unsigned i=0;i<num_threads;++i){
workers[i]=std::thread(&my_worker_thread_func,i);
}
}
Even though workers
has a fixed size in the loop, we need the loop index to pass to the thread
function, so we cannot use range-based for
. This requires that we duplicate num_threads
,
adding the potential for error as we must ensure that it is correctly updated in both places if we
ever change it.
jss::indexed_view
to the rescue
jss::indexed_view
provides a means of obtaining
that index with a range-based for
loop: it creates a new view range which wraps the original
range, where each element holds the loop index, as well as a reference to the element of the
original range.
With jss::indexed_view
, we can avoid the duplication from the previous example and use the
range-based for
:
std::vector<std::thread> workers;
void setup_workers(unsigned num_threads){
workers.resize(num_threads);
for(auto entry: jss::indexed_view(workers)){
entry.value=std::thread(&my_worker_thread_func,entry.index);
}
}
As you can see from this example, the value
field is writable: it is a reference to the underlying
value if the iterator on the source range is a reference. This allows you to use it to modify the
elements in the source range if they are non-const
.
jss::indexed_view
also works with iterator-based ranges, so if you have a pair of iterators, then
you can still use range-based for
loops. For example, the following code processes the elements up
to the first zero in the supplied vector, or the whole vector if there is no zero.
void foo(std::vector<int> const& v){
auto end=std::find(v.begin(),v.end(),0);
for(auto entry: jss::indexed_view(v.begin(),end)){
process(entry.index,entry.value);
}
}
Finally, jss::indexed_view
can also be used with algorithms that require iterator-based ranges,
so our first example could also be written as:
std::vector<std::thread> workers;
void setup_workers(unsigned num_threads){
workers.resize(num_threads);
auto view=jss::indexed_view(workers);
std::for_each(view.begin(),view.end(),[](auto entry){
entry.value=std::thread(&my_worker_thread_func,entry.index);
});
}
Final words
Having to use non-ranged for
loop to get the loop index introduces a potential source of error: it
is easy to mistype the loop index either in the for
-loop header, or when using it to get the
indexed element, especially in nested loops.
By using jss::indexed_view
to wrap the range, you can eliminate this particular source of error,
as well as making it clear that you are iterating across the entire range, and that you need the
index.
Get the source from github and use it in your project now.
Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: cplusplus, memory, safety, undefined, crash, security
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
object_ptr - a safer replacement for raw pointers
Thursday, 21 March 2019
Yesterday I uploaded my object_ptr<T>
implementation
to github under the Boost
Software License.
This is an implementation of a class similar
to
std::experimental::observer_ptr<T>
from
the Library Fundamentals TS 2, but with various
improvements suggested in WG21 email discussions of the feature.
The idea
of
std::experimental::observer_ptr<T>
is
that it provides a pointer-like object that does not own the pointee, and thus
can be used in place of a raw pointer, but does not allow pointer arithmetic or
use of delete
, or pointer arithmetic, so is not as dangerous as a raw
pointer. Its use also serves as documentation: this object is owned elsewhere,
so explicitly check the lifetime of the pointed-to object — there is
nothing to prevent a dangling std::experimental::observer_ptr<T>
.
My
implementation of this concept
has a different name (object_ptr<T>
). I feel that observer_ptr
is a bad
name, because it conjures up the idea of the Observer pattern, but it doesn't
really "observe" anything. I believe object_ptr
is better: it is a pointer to
an object, so doesn't have any array-related functionality such as pointer
arithmetic, but it doesn't tell you anything about ownership.
It also has slightly different semantics to std::experimental::observer_ptr
:
it allows incoming implicit conversions, and drops the release()
member
function. The implicit conversions make it easier to use as a function
parameter, without losing any safety, as you can freely pass a
std::shared_ptr<T>
, std::unique_ptr<T>
, or even a raw pointer to a function
accepting object_ptr<T>
. It makes it much easier to use jss::object_ptr<T>
as a drop-in replacement for T*
in function parameters. There is nothing you
can do with a jss::object_ptr<T>
that you can't do with a T*
, and in fact
there is considerably less that you can do: without explicitly requesting the
stored T*
, you can only use it to access the pointed-to object, or compare it
with other pointers. The same applies with std::shared_ptr<T>
and
std::unique_ptr<T>
: you are reducing functionality, so this is safe, and
reducing typing for safe operations is a good thing.
I strongly recommend using object_ptr<T>
or an equivalent implementation of
the observer_ptr
concept anywhere you have a non-owning raw pointer in your
codebase that points to a single object.
If you have a raw pointer that does own its pointee, then I would strongly
suggest finding a smart pointer class to use as a wrapper to encapsulate that
ownership. For example, std::unique_ptr
or std::shared_ptr
with a custom
deleter might well do the job.
Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: cplusplus, memory, safety
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Begin and End with range-based for loops
Saturday, 23 February 2019
On slack the other day, someone mentioned that lots of companies don't use range-based for
loops
in their code because they use PascalCase
identifiers, and their containers thus have Begin
and
End
member functions rather than the expected begin
and end
member functions.
Having recently worked in a codebase where this was the case, I thought it would be nice to provide a solution to this problem.
The natural solution would be to provide global overloads of the begin
and end
functions: these
are always checked by range-based for
if the member functions begin()
and end()
are not
found. However, when defining global function templates, you need to be sure that they are not too
greedy: you don't want them to cause ambiguity in overload resolution or be picked in preference to
std::begin
or std::end
.
My first thought was to jump through metaprogramming hoops checking for Begin()
and End()
members that return iterators, but then I thought that seemed complicated, so looked for something
simpler to start with.
The simplest possible solution is just to declare the functions the same way that std::begin()
and
std::end()
are declared:
template <class C> constexpr auto begin(C &c) -> decltype(c.Begin()) {
return c.Begin();
}
template <class C> constexpr auto begin(const C &c) -> decltype(c.Begin()) {
return c.Begin();
}
template <class C> constexpr auto end(C &c) -> decltype(c.End()) {
return c.End();
}
template <class C> constexpr auto end(const C &c) -> decltype(c.End()) {
return c.End();
}
Initially I thought that this would be too greedy, and cause problems, but it turns out this is fine.
The use of decltype(c.Begin())
triggers SFINAE, so only types which have a public member named
Begin
which can be invoked with empty parentheses are considered; for anything else these
functions are just discarded and not considered for overload resolution.
The only way this is likely to be a problem is if the user has also defined a begin
free function
template for a class that has a suitable Begin
member, in which case this would potentially
introduce overload resolution ambiguity. However, this seems really unlikely in practice: most such
function templates will end up being a better match, and any non-template functions are
almost certainly a better match.
So there you have it: in this case, the simplest solution really is good enough! Just include this
header and you're can
freely use range-based for
loops with containers that use Begin()
and End()
instead of
begin()
and end()
.
Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: cplusplus, for, range
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Getting tuple elements with a runtime index
Tuesday, 28 March 2017
std::tuple
is great. It provides a nice, generic way of holding a fixed-size
set of data items of whatever types you need.
However, sometimes it has limitations that mean it doesn't quite work as you'd like. One of these is accessing an item based on a runtime index.
std::get
needs a compile-time index
The way to get the n
th item in a tuple is to use std::get
:
std::get<n>(my_tuple)
. This works nicely, as long as n
is a compile-time
constant. If you've got a value that is calculated at runtime, this doesn't
work: you can't use a value that isn't known until runtime as a template
parameter.
std::tuple<int,int,int> my_tuple=...;
size_t index;
std::cin>>index;
int val=std::get<index>(my_tuple); // won't compile
So, what can we do? We need a new function, which I'll call runtime_get
, to
retrieve the n
th value, where n
is a runtime value.
template<typename Tuple>
... runtime_get(Tuple&& t,size_t index){
...
}
The question is: how do we implement it?
Fixed return type
The return type is easy: our function must have a single return type for any
given Tuple
. That means that all the elements in the tuple must have the same
type, so we can just use the type of the first element. std::tuple_element
will tell us this, though we must first adjust our template parameter so it's
not a reference.
template<typename Tuple>
typename std::tuple_element<
0,typename std::remove_reference<Tuple>::type>::type&
runtime_get(Tuple&& t,size_t index){
...
}
Note: C++17 includes std::variant
, so you might think we could use that to
hold the return type, but that wouldn't actually help us: to get the value from
a variant, you need to call std::get<n>(v)
, which requires n
to be a
constant (again)!
OK, so the return type is just a reference to the type of the first element. How do we get the element?
Retrieving the n
th element
We can't do a straightforward switch
, because that requires knowing all the
cases in advance, and we want this to work for any size of tuple.
One way would be to have a recursive function that checked the runtime index against a compile-time index, and then called the function with the next compile-time index if they were different, but that would mean that the access time would depend on the index, and potentially end up with a deeply nested function call if we wanted the last element in a large tuple.
One thing we can do is use the index
value as an array index. If we have an
array of functions, each of which returns the corresponding element from the
tuple, then we can call the appropriate function to return the relevant index.
The function we need is of course std::get
; it's just a matter of getting the
function signature right. Our overload of std::get
has the following
signature for const
and non-const
tuples:
template <size_t I, class... Types>
constexpr tuple_element_t<I, tuple<Types...>>&
get(tuple<Types...>&) noexcept;
template <size_t I, class... Types>
constexpr const tuple_element_t<I, tuple<Types...>>&
get(const tuple<Types...>&) noexcept;
so, we can capture the relevant instantiation of std::get
for a given tuple
type Tuple
in a function pointer declared as:
using return_type=typename std::tuple_element<0,Tuple>::type&;
using get_func_ptr=return_type(*)(Tuple&) noexcept;
The signature is the same, regardless of the index, because we made the decision that we're only going to support tuples where all the elements are the same.
This makes it easy to build a function table: use a variadic pack expansion to supply
a different index for each array element, and fill in std::get<N>
for each entry:
template<
typename Tuple,
typename Indices=std::make_index_sequence<std::tuple_size<Tuple>::value>>
struct runtime_get_func_table;
template<typename Tuple,size_t ... Indices>
struct runtime_get_func_table<Tuple,std::index_sequence<Indices...>>{
using return_type=typename std::tuple_element<0,Tuple>::type&;
using get_func_ptr=return_type (*)(Tuple&) noexcept;
static constexpr get_func_ptr table[std::tuple_size<Tuple>::value]={
&std::get<Indices>...
};
};
template<typename Tuple,size_t ... Indices>
constexpr typename
runtime_get_func_table<Tuple,std::index_sequence<Indices...>>::get_func_ptr
runtime_get_func_table<Tuple,std::index_sequence<Indices...>>::table[
std::tuple_size<Tuple>::value];
We need the separate redeclaration of the table to satisfy a pre-C++17 compiler;
with C++17 inline
variables it is no longer needed.
Our final function is then just a simple wrapper around a table lookup:
template<typename Tuple>
constexpr
typename std::tuple_element<0,typename std::remove_reference<Tuple>::type>::type&
runtime_get(Tuple&& t,size_t index){
using tuple_type=typename std::remove_reference<Tuple>::type;
if(index>=std::tuple_size<tuple_type>::value)
throw std::runtime_error("Out of range");
return runtime_get_func_table<tuple_type>::table[index](t);
}
It's constexpr
safe, though in a constexpr
context you could probably just
use std::get
directly anyway.
So, there you have it: a constant-time function for retrieving the n
th element
of a tuple where all the elements have the same type.
Final code
Here is the final code for a constant-time function to retrieve an item from a tuple based on a runtime index.
#include <tuple>
#include <utility>
#include <type_traits>
#include <stdexcept>
template<
typename Tuple,
typename Indices=std::make_index_sequence<std::tuple_size<Tuple>::value>>
struct runtime_get_func_table;
template<typename Tuple,size_t ... Indices>
struct runtime_get_func_table<Tuple,std::index_sequence<Indices...>>{
using return_type=typename std::tuple_element<0,Tuple>::type&;
using get_func_ptr=return_type (*)(Tuple&) noexcept;
static constexpr get_func_ptr table[std::tuple_size<Tuple>::value]={
&std::get<Indices>...
};
};
template<typename Tuple,size_t ... Indices>
constexpr typename
runtime_get_func_table<Tuple,std::index_sequence<Indices...>>::get_func_ptr
runtime_get_func_table<Tuple,std::index_sequence<Indices...>>::table[std::tuple_size<Tuple>::value];
template<typename Tuple>
constexpr
typename std::tuple_element<0,typename std::remove_reference<Tuple>::type>::type&
runtime_get(Tuple&& t,size_t index){
using tuple_type=typename std::remove_reference<Tuple>::type;
if(index>=std::tuple_size<tuple_type>::value)
throw std::runtime_error("Out of range");
return runtime_get_func_table<tuple_type>::table[index](t);
}
Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: cplusplus, tuples
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Design and Content Copyright © 2005-2025 Just Software Solutions Ltd. All rights reserved. | Privacy Policy