Blog Archive
The Intel x86 Memory Ordering Guarantees and the C++ Memory Model
Tuesday, 26 August 2008
The July 2008 version of the Intel 64 and IA-32 Architecture documents includes the information from the memory ordering white paper I mentioned before. This makes it clear that on x86/x64 systems the preferred implementation of the C++0x atomic operations is as follows (which has been confirmed in discussions with Intel engineers):
Memory Ordering | Store | Load |
---|---|---|
std::memory_order_relaxed | MOV [mem],reg | MOV reg,[mem] |
std::memory_order_acquire | n/a | MOV reg,[mem] |
std::memory_order_release | MOV [mem],reg | n/a |
std::memory_order_seq_cst | XCHG [mem],reg | MOV reg,[mem] |
As you can see, plain MOV
is enough for even
sequentially-consistent loads if a LOCK
ed instruction
such as XCHG
is used for the sequentially-consistent
stores.
One thing to watch out for is the Non-Temporal SSE instructions
(MOVNTI
, MOVNTQ
, etc.), which by their
very nature (i.e. non-temporal) don't follow the normal
cache-coherency rules. Therefore non-temporal stores must be
followed by an SFENCE
instruction in order for their
results to be seen by other processors in a timely fashion.
Additionally, if you're writing drivers which deal with memory pages marked WC (Write-Combining) then additional fence instructions will be required to ensure visibility between processors. However, if you're programming with WC pages then this shouldn't be a problem.
Posted by Anthony Williams
[/ threading /] permanent link
Tags: intel, x86, c++, threading, memory ordering, memory model
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
"Simpler Multithreading in C++0x" Article Online
Thursday, 21 August 2008
My latest article, Simpler Multithreading in C++0x is now available as part of DevX.com's Special Report on C++0x.
The article provides a whistle-stop tour of the new C++0x multithreading support.
Posted by Anthony Williams
[/ news /] permanent link
Tags: multithreading, C++0x
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Boost 1.36.0 has been Released!
Tuesday, 19 August 2008
Verson 1.36.0 of the Boost libraries was released last week. Crucially, this contains the fix for the critical bug in the win32 implementation of condition variables found in the 1.35.0 release.
There are a few other changes to the Boost.Thread library: there are now functions for acquiring multiple locks without deadlock, for example.
There are of course new libraries to try, and other libraries have been updated too. See the full Release Notes for details, or just Download the release and give it a try.
Posted by Anthony Williams
[/ news /] permanent link
Tags: boost, C++
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
C++ Concurrency in Action Early Access Edition Available
Wednesday, 13 August 2008
As those of you who attended my talk on The Future of Concurrency in C++ at ACCU 2008 (or read the slides) will know, I'm writing a book on concurrency in C++: C++ Concurrency in Action: Practical Multithreading, due to be published next year.
Those nice folks over at Manning have made it available through their Early Access Program so you can start reading without having to wait for the book to be finished. By purchasing the Early Access Edition, you will get access to each chapter as it becomes available as well as your choice of a hard copy or Ebook when the book is finished. Plus, if you have any comments on the unfinished manuscript I may be able to take them into account as I revise each chapter. Currently, early drafts of chapters 1, 3, 4 and 5 are available.
I will be covering all aspects of multithreaded programming with the new C++0x standard, from the details of the new C++0x memory model and atomic operations to managing threads and designing parallel algorithms and thread-safe containers. The book will also feature a complete reference to the C++0x Standard Thread Library.
Posted by Anthony Williams
[/ news /] permanent link
Tags: C++, concurrency, multithreading
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
July 2008 C++ Standards Committee Mailing Published
Wednesday, 30 July 2008
The July 2008 mailing for the C++ Standards
Committee was published today. Primarily this is just an update on the "state of evolution" papers, and the issue
lists. However, there are a couple of new and revised papers. Amongst them is my revised paper on packaged_task
: N2709: Packaging Tasks for Asynchronous Execution.
As I mentioned when the most recent C++0x draft
was published, futures are still under discussion,
and the LWG requested that packaged_task
be moved to a separate paper, with a few minor changes. N2709 is this separate paper. Hopefully the LWG will
approve this paper at the September meeting of the C++ committee in San Francisco; if they don't, then packaged_task
will join the long list of proposals that have missed the boat for C++0x.
Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: C++0x, C++, standards
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
How Search Engines See Keywords
Friday, 25 July 2008
Jennifer Laycock's recent post on How Search Engines See Keywords over at Search Engine Guide really surprised me. It harks back to the 1990s, with talk of keyword density, and doesn't match my understanding of modern search engines at all. It especially surprised me given the author: I felt that Jennifer was pretty savvy about these things. Maybe I'm just missing something really crucial.
Anyway, my understanding is that the search engines index each and every word on your page, and store a count of each word and phrase. If you say "rubber balls" three times, it doesn't matter if you also say "red marbles" three times: the engines don't assign "keywords" to a page, they find pages that match what the user types. This is why if I include a random phrase on a web page exactly once, and then search for that phrase then my page will likely show up in the results (assuming my phrase was sufficiently uncommon), even though other phrases might appear more often on the same page.
Once the engine has found the pages that contain the phrase that users have searched for (whether in content, or in links to that page), the search engine then ranks those pages to decide what to show. The ranking will use things like the number of times the phrase appears on the page, whether it appears in the title, in headings, links, <strong> tags or just in plain text, how many other pages link to that page with that phrase, and all the usual stuff.
Here, let's put it to the test. At the time of writing, a search on Google for "wibble flibble splodge bucket" with quotes returns no results, and a search without quotes returns just three entries. Given Google's crawl rate for my website, I expect this blog entry will turn up in the search results for that phrase within a few days, even though it only appears the once and other phrases such as "search engines" appear far more often. Of course, I may be wrong, but only time will tell.
Posted by Anthony Williams
[/ webdesign /] permanent link
Tags: seo, search engines, keywords
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Review of the Windows Installer XML (WiX) Toolset
Tuesday, 15 July 2008
A couple of weeks ago, one of the products I maintain for a client needed a new installer. This application is the server part of a client-server suite, and runs as a Service on Microsoft Windows. The old installer was built using the simple installer-builder that is part of Microsoft Visual Studio, and could not easily be extended to handle the desired functionality. Enter the WiX toolset.
As its name suggests, the Windows Installer XML Toolset is a set of tools for building Windows Installer (MSI) packages, using XML to describe the installer. At first glance, this doesn't look much easier than populating the MSI tables directly using Microsoft's Orca tool, but it's actually really straightforward, especially with the help of the WiX tutorial.
It's all about the <XML>
The perceived complexity comes from several factors. First up,
every individual file, directory and registry entry that is to be
installed must be specified as a <Component>
and
given a GUID which can be used to uniquely identify that item. The
details of what to install where, combined with the inherent verbosity
of XML makes the installer build file quite large. Thankfully, being
XML, whitespace such as indenting and blank lines can be used to
separate things into logical groups, and XML comments can be used if
necessary. The installer GUI is also built using XML, which can make
things very complicated. Thankfully WiX comes with a set of
pre-designed GUIs which can be referenced from your installer build
file — you can even provide your own images in order to brand
the installer with your company logo, for example.
Once you've got over the initial shock of the XML syntax, the
toolkit is actually really easy to use. The file and directory
structure to be used by the installed application is described using
nested XML elements to mimic the directory structure, with special
predefined directory identifiers for things like the Windows
directory, the Program Files directory or the user's My Documents
folder. You can also use nested <Feature>
tags to
create a tree of features which the user can choose to install (or
not) if they opt for a custom install. Each "feature" is a group of
one or more components which are identified with nested tags in the
feature tags.
Custom Actions
What if you're not just installing a simple desktop application?
Windows Installer provides support for custom actions which can be run
as part of the installation, and WiX allows you to specify
these. The old installer for the server application used custom
actions to install the service, but these weren't actually necessary
— the Windows Installer can automatically configure services, it
was just that the Visual Studio Installer builder didn't support
that. With WiX it's just a matter of adding a
<ServiceInstall>
tag to one of your components.
The custom actions we did require were easy to add with the
<CustomAction>
tags — you can write custom
actions as DLLs or EXEs, or even as simple VBScript included directly
in the installer build file. These can then be added to the
<InstallExecuteSequence>
at the appropriate point,
taking care to specify the right conditions (such as whether to run on
install or uninstall, and whether or not to run depending on which
features are being installed).
Conclusion
The WiX toolset is very powerful, and gives you full control over everything the Windows Installer can do. Though the XML syntax is a little cumbersome, it is actually quite simple to use. Once you get used to the syntax, it is easy to see what the installer is doing, and what you need to do to get the desired effect. Though designing installer GUIs is quite hard, the supplied options will be sufficient in many cases and it is easy to customize them to your needs.
It's a free tool, so you can't beat it on price, and the tutorial really reduces the learning curve. Next time you need to build an installer, I recommend you give WiX a try.
Posted by Anthony Williams
[/ reviews /] permanent link
Tags: windows, installer, msi, wix, review
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
C++ Concurrency Papers Still Under Discussion
Monday, 07 July 2008
Last week I wrote about the new working draft of the C++0x standard and the newly-accepted concurrency papers, but I didn't mention the papers still under discussion. There's a few of those, listed below. The committee intends to publish a formal Committee Draft of the C++0x Standard at the end of the September meeting, so anything not voted into the WP at that meeting has missed the boat (though of course defects will still be fixed).
- N2671: An Asynchronous Future Value: Proposed Wording
- For those of you who've been following my postings about asynchronous
future values for C++, this is the latest proposal on
futures. Though it was discussed at the June meeting, the LWG didn't
feel it was ready to be voted in to the working draft yet. At the request of
the LWG,
packaged_task
has been removed; I should have a separate proposal for that ready before the next meeting. - N2668: Concurrency Modifications to Basic String
- Yes, I listed this as approved last week, but I misread the
minutes of the votes: it is still under discussion. The changes in
this paper ensure that it is safe for two threads to access the same
std::string
object at the same time, provided they both perform only read operations. They also ensure that copying a string object and then modifying that copy is safe, even if another thread is accessing the original. This essentially disallows copy-on-write implementations since the benefits are now severely limited. - N2633: Improved support for bidirectional fences
- This paper aims to simplify and improve the support for fences
(also known as memory barriers) when writing code using the new atomic
types. As the paper points out, the current
atomic_variable.fence(memory_ordering)
semantics can mean that compilers have to issue stronger-than-necessary fences in many cases. By making the fences free functions that are not tied to an individual variable, they will map much better to the underlying instructions, and should lead to clearer (and more optimal) code. - N2643: C++ Data-Dependency Ordering: Function Annotation
- This paper is a counterpart to N2664:
C++ Data-Dependency Ordering: Atomics and Memory Model, and
extends the list of cases where dependencies are carried by allowing
you to annotate functions. The new style
[[annotation syntax]]
is used to indicate which parameters carry a dependency into a function, and whether or not the return type carries a dependency to the call site.
Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: C++, concurrency, C++0x
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
New C++ Working Draft and Concurrency Papers Now Available
Friday, 04 July 2008
The post-meeting mailing following June's C++ Standards committee meeting in France is now available. This includes a new Working Draft for the C++0x standard, and a few concurrency-related papers.
From a concurrency point of view, there are several papers of interest. Firstly, a few have been accepted into the working draft, notably:
- N2661: A Foundation to Sleep On
- This paper provides a generalised time
point and duration library, which is used by the thread functions that
take times or durations. These have been updated to use these new
types and renamed to make their purpose clearer: functions that wait
for a duration are now called
xxx_for
, and take a value of typestd::chrono::duration<Rep,Period>
, whereas those that take absolute time points are now calledxxx_until
and take a value of typestd::chrono::time_point<Clock,Duration>
. - N2668: Concurrency Modifications to Basic String
- Update: This paper has not actually been approved, I was
mistaken. Though the majority of the committee were in favour, there
was not consensus, so this paper will be discussed at a future
meeting. Thanks to Herb Sutter for
picking me up on this.
The changes in this paper ensure that it is safe for two threads to access the samestd::string
object at the same time, provided they both perform only read operations. They also ensure that copying a string object and then modifying that copy is safe, even if another thread is accessing the original. This essentially disallows copy-on-write implementations since the benefits are now severely limited. - N2660: Dynamic Initialization and Destruction with Concurrency
- With the changes from this paper, if an application uses multiple threads then the initialization and destruction of objects with static storage duration (such as global variables) may run concurrently on separate threads. This can provide faster start-up and shut-down times for an application, but it can also introduce the possibility of race conditions where none existed previously. If you use threads in your application, it is now even more important to check the initialization order of objects with static storage duration.
- N2514: Implicit Conversion Operators for Atomics
- With this change, the atomic types such as
std::atomic_int
are implicitly convertible to their corresponding fundamental types. This means, for example, that:std::atomic_int x; int y=x;
is well-formed where it wasn't previously. The implicit conversions are equivalent to calling theload()
member function, and havememory_order_seq_cst
ordering semantics. - N2674: Shared_ptr atomic access, revision 1
- This paper introduces a new set of overloads of the free functions
for atomic operations (such as
atomic_load
andatomic_store
), which operate on instances ofstd::shared_ptr<>
. This allows one thread to read an instance ofstd::shared_ptr
whilst another thread is modifying that same instance if they both use the new atomic functions.
This paper also renamesatomic_swap
operations toatomic_exchange
(and likewise foratomic_compare_swap
and the corresponding member functions) for all atomic types, in order to avoid confusion with other types that provideswap
functions. The atomic exchange operations only alter the value of a single object, replacing the old value with a new one, they do not exchange the values of two objects in the way thatstd::swap
does. - N2664: C++ Data-Dependency Ordering: Atomics and Memory Model
- With the adoption of this paper the memory model gets a new
ordering option:
memory_order_consume
. This is a limited form ofmemory_order_acquire
which allows for data-dependent ordering. If a thread usesmemory_order_consume
, then it is not guaranteed to see modifications to other variables made by the thread that performed the releasing operation unless those variables are accessed in conjunction with the consumed variable. This means, for example, that member variables of an object are visible if the consumed value is a pointer to that object, but that values of independent objects are not necessarily visible. This allows the compiler to perform some optimizations that are forbidden bymemory_order_acquire
, and reduces the synchronization overhead on some hardware architectures. - N2678: Error Handling Specification for Chapter 30 (Threads)
- This paper brings the exceptions thrown by the thread under the
new
system_error
umbrella, with corresponding error codes and error categories. - N2669: Thread-Safety in the Standard Library (Rev 2)
- Now the standard supports threads, we need to say which standard library operations are thread-safe, and which are not. This paper basically says that non-modifying operations on the same object are safe, and any operations on separate objects are also safe. Also, separate threads may call the same library functions on separate objects without problems. As you might expect, concurrent modifications to the same object are data races and undefined behaviour.
The committee also voted to include N2659:
Thread-Local Storage in C++0x, but it doesn't appear to be in the
current draft. This paper introduces the thread_local
keyword to indicate that each thread should have its own copy of a
given object.
Finally, N2657: Local and Unnamed Types as Template Arguments has been incorporated in the working paper. Though this isn't directly concurrency related, it is something I've been campaigning for since N1427 back in 2003.
Apart from N2657, I've only listed the concurrency changes: check out the Working Draft for the C++0x standard, and the State of C++ Evolution for more details on the changes.
Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: concurrency, c++
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Condition Variable Spurious Wakes
Friday, 27 June 2008
Condition variables are a useful mechanism for waiting until an
event occurs or some "condition" is satisfied. For example, in my implementation
of a thread-safe queue I use a condition variable to avoid
busy-waiting in wait_and_pop()
when the queue is
empty. However, condition variables have one "feature" which is a
common source of bugs: a wait on a condition variable may return even
if the condition variable has not been notified. This is called a
spurious wake.
Spurious wakes cannot be predicted: they are essentially random from the user's point of view. However, they commonly occur when the thread library cannot reliably ensure that a waiting thread will not miss a notification. Since a missed notification would render the condition variable useless, the thread library wakes the thread from its wait rather than take the risk.
Bugs due to spurious wakes
Consider the code for wait_and_pop
from my thread-safe
queue:
void wait_and_pop(Data& popped_value) { boost::mutex::scoped_lock lock(the_mutex); while(the_queue.empty()) { the_condition_variable.wait(lock); } popped_value=the_queue.front(); the_queue.pop(); }
If we know that there's only one consumer thread, it would be
tempting to write this with an if
instead of a
while
, on the assumption that there's only one thread
waiting, so if it's been notified, the queue must not be empty:
if(the_queue.empty()) // Danger, Will Robinson { the_condition_variable.wait(lock); }
With the potential of spurious wakes this is not safe: the
wait
might finish even if the condition variable was not
notified. We therefore need the while
, which has the
added benefit of allowing multiple consumer threads: we don't need to
worry that another thread might remove the last item from the queue,
since we're checking to see if the queue is empty before
proceeding.
That's the beginner's bug, and one that's easily overcome with a
simple rule: always check your predicate in a loop when
waiting with a condition variable. The more insidious bug
comes from timed_wait()
.
Timing is everything
condition_variable::wait()
has a companion function
that allows the user to specify a time limit on how long they're
willing to wait: condition_variable::timed_wait()
. This
function comes as a pair of overloads: one that takes an absolute
time, and one that takes a duration. The absolute time overload will
return once the clock reaches the specified time, whether or not it
was notified. The duration overload will return once the specified
duration has elapsed: if you say to wait for 3 seconds, it will stop
waiting after 3 seconds. The insidious bug comes from the overload
that takes a duration.
Suppose we wanted to add a timed_wait_and_pop()
function to our queue, that allowed the user to specify a duration to
wait. We might be tempted to write it as:
template<typename Duration> bool timed_wait_and_pop(Data& popped_value, Duration const& timeout) { boost::mutex::scoped_lock lock(the_mutex); while(the_queue.empty()) { if(!the_condition_variable.timed_wait(lock,timeout)) return false; } popped_value=the_queue.front(); the_queue.pop(); return true; }
At first glance this looks fine: we're handling spurious wakes by
looping on the timed_wait()
call, and we're passing the
timeout
in to that call. Unfortunately, the
timeout
is a duration, so every call to
timed_wait()
will wait up to the specified amount of
time. If the timeout was 1 second, and the timed_wait()
call woke due to a spurious wake after 0.9 seconds, the next time
round the loop would wait for a further 1 second. In theory this could
continue ad infinitum, completely defeating the purpose of using
timed_wait()
in the first place.
The solution is simple: use the absolute time overload instead. By specifying a particular clock time as the timeout, the remaining wait time decreases with each call. This requires that we determine the final timeout prior to the loop:
template<typename Duration> bool timed_wait_and_pop(Data& popped_value, Duration const& wait_duration) { boost::system_time const timeout=boost::get_system_time()+wait_duration; boost::mutex::scoped_lock lock(the_mutex); while(the_queue.empty()) { if(!the_condition_variable.timed_wait(lock,timeout)) return false; } popped_value=the_queue.front(); the_queue.pop(); return true; }
Though this solves the problem, it's easy to make the mistake. Thankfully, there is a better way to wait that doesn't suffer from this problem: pass the predicate to the condition variable.
Passing the predicate to the condition variable
Both wait()
and timed_wait()
come with
additional overloads that allow the user to specify the condition
being waited for as a predicate. These overloads encapsulate the
while
loops from the examples above, and ensure that
spurious wakes are correctly handled. All that is required is that the
condition being waited for can be checked by means of a simple
function call or a function object which is passed as an additional
parameter to the wait()
or timed_wait()
call.
wait_and_pop()
can therefore be written like this:
struct queue_not_empty { std::queue<Data>& queue; queue_not_empty(std::queue<Data>& queue_): queue(queue_) {} bool operator()() const { return !queue.empty(); } }; void wait_and_pop(Data& popped_value) { boost::mutex::scoped_lock lock(the_mutex); the_condition_variable.wait(lock,queue_not_empty(the_queue)); popped_value=the_queue.front(); the_queue.pop(); }
and timed_wait_and_pop()
can be written like this:
template<typename Duration> bool timed_wait_and_pop(Data& popped_value, Duration const& wait_duration) { boost::mutex::scoped_lock lock(the_mutex); if(!the_condition_variable.timed_wait(lock,wait_duration, queue_not_empty(the_queue))) return false; popped_value=the_queue.front(); the_queue.pop(); return true; }
Note that what we're waiting for is the queue not to be empty — the predicate is the reverse of the condition we would put in the while loop. This will be much easier to specify when compilers implement the C++0x lambda facilities.
Conclusion
Spurious wakes can cause some unfortunate bugs, which are hard to
track down due to the unpredictability of spurious wakes. These
problems can be avoided by ensuring that plain wait()
calls are made in a loop, and the timeout is correctly calculated for
timed_wait()
calls. If the predicate can be packaged as a
function or function object, using the predicated overloads of
wait()
and timed_wait()
avoids all the
problems.
Posted by Anthony Williams
[/ threading /] permanent link
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Design and Content Copyright © 2005-2025 Just Software Solutions Ltd. All rights reserved. | Privacy Policy