Condition Variable Spurious Wakes
Friday, 27 June 2008
Condition variables are a useful mechanism for waiting until an
event occurs or some "condition" is satisfied. For example, in my implementation
of a thread-safe queue I use a condition variable to avoid
busy-waiting in wait_and_pop()
when the queue is
empty. However, condition variables have one "feature" which is a
common source of bugs: a wait on a condition variable may return even
if the condition variable has not been notified. This is called a
spurious wake.
Spurious wakes cannot be predicted: they are essentially random from the user's point of view. However, they commonly occur when the thread library cannot reliably ensure that a waiting thread will not miss a notification. Since a missed notification would render the condition variable useless, the thread library wakes the thread from its wait rather than take the risk.
Bugs due to spurious wakes
Consider the code for wait_and_pop
from my thread-safe
queue:
void wait_and_pop(Data& popped_value) { boost::mutex::scoped_lock lock(the_mutex); while(the_queue.empty()) { the_condition_variable.wait(lock); } popped_value=the_queue.front(); the_queue.pop(); }
If we know that there's only one consumer thread, it would be
tempting to write this with an if
instead of a
while
, on the assumption that there's only one thread
waiting, so if it's been notified, the queue must not be empty:
if(the_queue.empty()) // Danger, Will Robinson { the_condition_variable.wait(lock); }
With the potential of spurious wakes this is not safe: the
wait
might finish even if the condition variable was not
notified. We therefore need the while
, which has the
added benefit of allowing multiple consumer threads: we don't need to
worry that another thread might remove the last item from the queue,
since we're checking to see if the queue is empty before
proceeding.
That's the beginner's bug, and one that's easily overcome with a
simple rule: always check your predicate in a loop when
waiting with a condition variable. The more insidious bug
comes from timed_wait()
.
Timing is everything
condition_variable::wait()
has a companion function
that allows the user to specify a time limit on how long they're
willing to wait: condition_variable::timed_wait()
. This
function comes as a pair of overloads: one that takes an absolute
time, and one that takes a duration. The absolute time overload will
return once the clock reaches the specified time, whether or not it
was notified. The duration overload will return once the specified
duration has elapsed: if you say to wait for 3 seconds, it will stop
waiting after 3 seconds. The insidious bug comes from the overload
that takes a duration.
Suppose we wanted to add a timed_wait_and_pop()
function to our queue, that allowed the user to specify a duration to
wait. We might be tempted to write it as:
template<typename Duration> bool timed_wait_and_pop(Data& popped_value, Duration const& timeout) { boost::mutex::scoped_lock lock(the_mutex); while(the_queue.empty()) { if(!the_condition_variable.timed_wait(lock,timeout)) return false; } popped_value=the_queue.front(); the_queue.pop(); return true; }
At first glance this looks fine: we're handling spurious wakes by
looping on the timed_wait()
call, and we're passing the
timeout
in to that call. Unfortunately, the
timeout
is a duration, so every call to
timed_wait()
will wait up to the specified amount of
time. If the timeout was 1 second, and the timed_wait()
call woke due to a spurious wake after 0.9 seconds, the next time
round the loop would wait for a further 1 second. In theory this could
continue ad infinitum, completely defeating the purpose of using
timed_wait()
in the first place.
The solution is simple: use the absolute time overload instead. By specifying a particular clock time as the timeout, the remaining wait time decreases with each call. This requires that we determine the final timeout prior to the loop:
template<typename Duration> bool timed_wait_and_pop(Data& popped_value, Duration const& wait_duration) { boost::system_time const timeout=boost::get_system_time()+wait_duration; boost::mutex::scoped_lock lock(the_mutex); while(the_queue.empty()) { if(!the_condition_variable.timed_wait(lock,timeout)) return false; } popped_value=the_queue.front(); the_queue.pop(); return true; }
Though this solves the problem, it's easy to make the mistake. Thankfully, there is a better way to wait that doesn't suffer from this problem: pass the predicate to the condition variable.
Passing the predicate to the condition variable
Both wait()
and timed_wait()
come with
additional overloads that allow the user to specify the condition
being waited for as a predicate. These overloads encapsulate the
while
loops from the examples above, and ensure that
spurious wakes are correctly handled. All that is required is that the
condition being waited for can be checked by means of a simple
function call or a function object which is passed as an additional
parameter to the wait()
or timed_wait()
call.
wait_and_pop()
can therefore be written like this:
struct queue_not_empty { std::queue<Data>& queue; queue_not_empty(std::queue<Data>& queue_): queue(queue_) {} bool operator()() const { return !queue.empty(); } }; void wait_and_pop(Data& popped_value) { boost::mutex::scoped_lock lock(the_mutex); the_condition_variable.wait(lock,queue_not_empty(the_queue)); popped_value=the_queue.front(); the_queue.pop(); }
and timed_wait_and_pop()
can be written like this:
template<typename Duration> bool timed_wait_and_pop(Data& popped_value, Duration const& wait_duration) { boost::mutex::scoped_lock lock(the_mutex); if(!the_condition_variable.timed_wait(lock,wait_duration, queue_not_empty(the_queue))) return false; popped_value=the_queue.front(); the_queue.pop(); return true; }
Note that what we're waiting for is the queue not to be empty — the predicate is the reverse of the condition we would put in the while loop. This will be much easier to specify when compilers implement the C++0x lambda facilities.
Conclusion
Spurious wakes can cause some unfortunate bugs, which are hard to
track down due to the unpredictability of spurious wakes. These
problems can be avoided by ensuring that plain wait()
calls are made in a loop, and the timeout is correctly calculated for
timed_wait()
calls. If the predicate can be packaged as a
function or function object, using the predicated overloads of
wait()
and timed_wait()
avoids all the
problems.
Posted by Anthony Williams
[/ threading /] permanent link
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Design and Content Copyright © 2005-2024 Just Software Solutions Ltd. All rights reserved. | Privacy Policy
4 Comments
Hello,
Thanks for your note on using condition variables, but is there any way to reproduce "Spurious Wakes" to test the code for such issues?.
PS: thanks for writing me back on wasinapple@gmail.com -Thanks HarishThere currently isn't a way to make the condition variable wake spuriously, but you could spawn a thread that sits there doing nothing but broadcasting the condition variable (thus causing waits to wake when the predicate hasn't changed).
Alternatively, you could temporarily replace the wait call with unlock/sleep/lock. In the presence of spurious wakes, this is essentially all you can rely on from wait() anyway. If you systematically do this, you might spot the problems even before you try testing the code, as it becomes apparent that variables don't magically change state just because your thread went to sleep for a while.
(Although the original post is over 2 years old, I'll still give it a shot)
I think that all of these are in a way incorrect, as the constructor for the local `boost::mutex::scoped_lock` variable will not time out correctly.
template<typename Duration> bool timed_wait_and_pop(Data& popped_value, Duration const& wait_duration) { boost::system_time const timeout=boost::get_system_time()+wait_duration;
boost::mutex::scoped_lock lock(the_mutex, timeout); // issues a timed_wait internally if(!lock) // passed duration? return false; // already too late. if(!the_condition_variable.timed_wait(lock,timeout, queue_not_empty(the_queue))) return false; popped_value=the_queue.front(); the_queue.pop(); return true; }(Although the original post is over 2 years old, I'll still give it a shot)
I think that all of these are in a way incorrect, as the constructor for the local `boost::mutex::scoped_lock` variable will not time out correctly.
template<typename Duration> bool timed_wait_and_pop(Data& popped_value, Duration const& wait_duration) { boost::system_time const timeout=boost::get_system_time()+wait_duration;
boost::mutex::scoped_lock lock(the_mutex, timeout); // issues a timed_wait internally if(!lock) // passed duration? return false; // already too late. if(!the_condition_variable.timed_wait(lock,timeout, queue_not_empty(the_queue))) return false; popped_value=the_queue.front(); the_queue.pop(); return true; }