Just Software Solutions

Blog Archive

The Intel x86 Memory Ordering Guarantees and the C++ Memory Model

Tuesday, 26 August 2008

The July 2008 version of the Intel 64 and IA-32 Architecture documents includes the information from the memory ordering white paper I mentioned before. This makes it clear that on x86/x64 systems the preferred implementation of the C++0x atomic operations is as follows (which has been confirmed in discussions with Intel engineers):

Memory OrderingStoreLoad
std::memory_order_relaxedMOV [mem],regMOV reg,[mem]
std::memory_order_acquiren/aMOV reg,[mem]
std::memory_order_releaseMOV [mem],regn/a
std::memory_order_seq_cstXCHG [mem],regMOV reg,[mem]

As you can see, plain MOV is enough for even sequentially-consistent loads if a LOCKed instruction such as XCHG is used for the sequentially-consistent stores.

One thing to watch out for is the Non-Temporal SSE instructions (MOVNTI, MOVNTQ, etc.), which by their very nature (i.e. non-temporal) don't follow the normal cache-coherency rules. Therefore non-temporal stores must be followed by an SFENCE instruction in order for their results to be seen by other processors in a timely fashion.

Additionally, if you're writing drivers which deal with memory pages marked WC (Write-Combining) then additional fence instructions will be required to ensure visibility between processors. However, if you're programming with WC pages then this shouldn't be a problem.

Posted by Anthony Williams
[/ threading /] permanent link
Tags: , , , , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

"Simpler Multithreading in C++0x" Article Online

Thursday, 21 August 2008

My latest article, Simpler Multithreading in C++0x is now available as part of DevX.com's Special Report on C++0x.

The article provides a whistle-stop tour of the new C++0x multithreading support.

Posted by Anthony Williams
[/ news /] permanent link
Tags: ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Boost 1.36.0 has been Released!

Tuesday, 19 August 2008

Verson 1.36.0 of the Boost libraries was released last week. Crucially, this contains the fix for the critical bug in the win32 implementation of condition variables found in the 1.35.0 release.

There are a few other changes to the Boost.Thread library: there are now functions for acquiring multiple locks without deadlock, for example.

There are of course new libraries to try, and other libraries have been updated too. See the full Release Notes for details, or just Download the release and give it a try.

Posted by Anthony Williams
[/ news /] permanent link
Tags: ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

C++ Concurrency in Action Early Access Edition Available

Wednesday, 13 August 2008

As those of you who attended my talk on The Future of Concurrency in C++ at ACCU 2008 (or read the slides) will know, I'm writing a book on concurrency in C++: C++ Concurrency in Action: Practical Multithreading, due to be published next year.

Those nice folks over at Manning have made it available through their Early Access Program so you can start reading without having to wait for the book to be finished. By purchasing the Early Access Edition, you will get access to each chapter as it becomes available as well as your choice of a hard copy or Ebook when the book is finished. Plus, if you have any comments on the unfinished manuscript I may be able to take them into account as I revise each chapter. Currently, early drafts of chapters 1, 3, 4 and 5 are available.

I will be covering all aspects of multithreaded programming with the new C++0x standard, from the details of the new C++0x memory model and atomic operations to managing threads and designing parallel algorithms and thread-safe containers. The book will also feature a complete reference to the C++0x Standard Thread Library.

Posted by Anthony Williams
[/ news /] permanent link
Tags: , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

July 2008 C++ Standards Committee Mailing Published

Wednesday, 30 July 2008

The July 2008 mailing for the C++ Standards Committee was published today. Primarily this is just an update on the "state of evolution" papers, and the issue lists. However, there are a couple of new and revised papers. Amongst them is my revised paper on packaged_task: N2709: Packaging Tasks for Asynchronous Execution.

As I mentioned when the most recent C++0x draft was published, futures are still under discussion, and the LWG requested that packaged_task be moved to a separate paper, with a few minor changes. N2709 is this separate paper. Hopefully the LWG will approve this paper at the September meeting of the C++ committee in San Francisco; if they don't, then packaged_task will join the long list of proposals that have missed the boat for C++0x.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

How Search Engines See Keywords

Friday, 25 July 2008

Jennifer Laycock's recent post on How Search Engines See Keywords over at Search Engine Guide really surprised me. It harks back to the 1990s, with talk of keyword density, and doesn't match my understanding of modern search engines at all. It especially surprised me given the author: I felt that Jennifer was pretty savvy about these things. Maybe I'm just missing something really crucial.

Anyway, my understanding is that the search engines index each and every word on your page, and store a count of each word and phrase. If you say "rubber balls" three times, it doesn't matter if you also say "red marbles" three times: the engines don't assign "keywords" to a page, they find pages that match what the user types. This is why if I include a random phrase on a web page exactly once, and then search for that phrase then my page will likely show up in the results (assuming my phrase was sufficiently uncommon), even though other phrases might appear more often on the same page.

Once the engine has found the pages that contain the phrase that users have searched for (whether in content, or in links to that page), the search engine then ranks those pages to decide what to show. The ranking will use things like the number of times the phrase appears on the page, whether it appears in the title, in headings, links, <strong> tags or just in plain text, how many other pages link to that page with that phrase, and all the usual stuff.

Here, let's put it to the test. At the time of writing, a search on Google for "wibble flibble splodge bucket" with quotes returns no results, and a search without quotes returns just three entries. Given Google's crawl rate for my website, I expect this blog entry will turn up in the search results for that phrase within a few days, even though it only appears the once and other phrases such as "search engines" appear far more often. Of course, I may be wrong, but only time will tell.

Posted by Anthony Williams
[/ webdesign /] permanent link
Tags: , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Review of the Windows Installer XML (WiX) Toolset

Tuesday, 15 July 2008

A couple of weeks ago, one of the products I maintain for a client needed a new installer. This application is the server part of a client-server suite, and runs as a Service on Microsoft Windows. The old installer was built using the simple installer-builder that is part of Microsoft Visual Studio, and could not easily be extended to handle the desired functionality. Enter the WiX toolset.

As its name suggests, the Windows Installer XML Toolset is a set of tools for building Windows Installer (MSI) packages, using XML to describe the installer. At first glance, this doesn't look much easier than populating the MSI tables directly using Microsoft's Orca tool, but it's actually really straightforward, especially with the help of the WiX tutorial.

It's all about the <XML>

The perceived complexity comes from several factors. First up, every individual file, directory and registry entry that is to be installed must be specified as a <Component> and given a GUID which can be used to uniquely identify that item. The details of what to install where, combined with the inherent verbosity of XML makes the installer build file quite large. Thankfully, being XML, whitespace such as indenting and blank lines can be used to separate things into logical groups, and XML comments can be used if necessary. The installer GUI is also built using XML, which can make things very complicated. Thankfully WiX comes with a set of pre-designed GUIs which can be referenced from your installer build file — you can even provide your own images in order to brand the installer with your company logo, for example.

Once you've got over the initial shock of the XML syntax, the toolkit is actually really easy to use. The file and directory structure to be used by the installed application is described using nested XML elements to mimic the directory structure, with special predefined directory identifiers for things like the Windows directory, the Program Files directory or the user's My Documents folder. You can also use nested <Feature> tags to create a tree of features which the user can choose to install (or not) if they opt for a custom install. Each "feature" is a group of one or more components which are identified with nested tags in the feature tags.

Custom Actions

What if you're not just installing a simple desktop application? Windows Installer provides support for custom actions which can be run as part of the installation, and WiX allows you to specify these. The old installer for the server application used custom actions to install the service, but these weren't actually necessary — the Windows Installer can automatically configure services, it was just that the Visual Studio Installer builder didn't support that. With WiX it's just a matter of adding a <ServiceInstall> tag to one of your components.

The custom actions we did require were easy to add with the <CustomAction> tags — you can write custom actions as DLLs or EXEs, or even as simple VBScript included directly in the installer build file. These can then be added to the <InstallExecuteSequence> at the appropriate point, taking care to specify the right conditions (such as whether to run on install or uninstall, and whether or not to run depending on which features are being installed).

Conclusion

The WiX toolset is very powerful, and gives you full control over everything the Windows Installer can do. Though the XML syntax is a little cumbersome, it is actually quite simple to use. Once you get used to the syntax, it is easy to see what the installer is doing, and what you need to do to get the desired effect. Though designing installer GUIs is quite hard, the supplied options will be sufficient in many cases and it is easy to customize them to your needs.

It's a free tool, so you can't beat it on price, and the tutorial really reduces the learning curve. Next time you need to build an installer, I recommend you give WiX a try.

Posted by Anthony Williams
[/ reviews /] permanent link
Tags: , , , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

C++ Concurrency Papers Still Under Discussion

Monday, 07 July 2008

Last week I wrote about the new working draft of the C++0x standard and the newly-accepted concurrency papers, but I didn't mention the papers still under discussion. There's a few of those, listed below. The committee intends to publish a formal Committee Draft of the C++0x Standard at the end of the September meeting, so anything not voted into the WP at that meeting has missed the boat (though of course defects will still be fixed).

N2671: An Asynchronous Future Value: Proposed Wording
For those of you who've been following my postings about asynchronous future values for C++, this is the latest proposal on futures. Though it was discussed at the June meeting, the LWG didn't feel it was ready to be voted in to the working draft yet. At the request of the LWG, packaged_task has been removed; I should have a separate proposal for that ready before the next meeting.
N2668: Concurrency Modifications to Basic String
Yes, I listed this as approved last week, but I misread the minutes of the votes: it is still under discussion. The changes in this paper ensure that it is safe for two threads to access the same std::string object at the same time, provided they both perform only read operations. They also ensure that copying a string object and then modifying that copy is safe, even if another thread is accessing the original. This essentially disallows copy-on-write implementations since the benefits are now severely limited.
N2633: Improved support for bidirectional fences
This paper aims to simplify and improve the support for fences (also known as memory barriers) when writing code using the new atomic types. As the paper points out, the current atomic_variable.fence(memory_ordering) semantics can mean that compilers have to issue stronger-than-necessary fences in many cases. By making the fences free functions that are not tied to an individual variable, they will map much better to the underlying instructions, and should lead to clearer (and more optimal) code.
N2643: C++ Data-Dependency Ordering: Function Annotation
This paper is a counterpart to N2664: C++ Data-Dependency Ordering: Atomics and Memory Model, and extends the list of cases where dependencies are carried by allowing you to annotate functions. The new style [[annotation syntax]] is used to indicate which parameters carry a dependency into a function, and whether or not the return type carries a dependency to the call site.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

New C++ Working Draft and Concurrency Papers Now Available

Friday, 04 July 2008

The post-meeting mailing following June's C++ Standards committee meeting in France is now available. This includes a new Working Draft for the C++0x standard, and a few concurrency-related papers.

From a concurrency point of view, there are several papers of interest. Firstly, a few have been accepted into the working draft, notably:

N2661: A Foundation to Sleep On
This paper provides a generalised time point and duration library, which is used by the thread functions that take times or durations. These have been updated to use these new types and renamed to make their purpose clearer: functions that wait for a duration are now called xxx_for, and take a value of type std::chrono::duration<Rep,Period>, whereas those that take absolute time points are now called xxx_until and take a value of type std::chrono::time_point<Clock,Duration>.
N2668: Concurrency Modifications to Basic String
Update: This paper has not actually been approved, I was mistaken. Though the majority of the committee were in favour, there was not consensus, so this paper will be discussed at a future meeting. Thanks to Herb Sutter for picking me up on this.
The changes in this paper ensure that it is safe for two threads to access the same std::string object at the same time, provided they both perform only read operations. They also ensure that copying a string object and then modifying that copy is safe, even if another thread is accessing the original. This essentially disallows copy-on-write implementations since the benefits are now severely limited.
N2660: Dynamic Initialization and Destruction with Concurrency
With the changes from this paper, if an application uses multiple threads then the initialization and destruction of objects with static storage duration (such as global variables) may run concurrently on separate threads. This can provide faster start-up and shut-down times for an application, but it can also introduce the possibility of race conditions where none existed previously. If you use threads in your application, it is now even more important to check the initialization order of objects with static storage duration.
N2514: Implicit Conversion Operators for Atomics
With this change, the atomic types such as std::atomic_int are implicitly convertible to their corresponding fundamental types. This means, for example, that:
std::atomic_int x;
int y=x;
is well-formed where it wasn't previously. The implicit conversions are equivalent to calling the load() member function, and have memory_order_seq_cst ordering semantics.
N2674: Shared_ptr atomic access, revision 1
This paper introduces a new set of overloads of the free functions for atomic operations (such as atomic_load and atomic_store), which operate on instances of std::shared_ptr<>. This allows one thread to read an instance of std::shared_ptr whilst another thread is modifying that same instance if they both use the new atomic functions.
This paper also renames atomic_swap operations to atomic_exchange (and likewise for atomic_compare_swap and the corresponding member functions) for all atomic types, in order to avoid confusion with other types that provide swap functions. The atomic exchange operations only alter the value of a single object, replacing the old value with a new one, they do not exchange the values of two objects in the way that std::swap does.
N2664: C++ Data-Dependency Ordering: Atomics and Memory Model
With the adoption of this paper the memory model gets a new ordering option: memory_order_consume. This is a limited form of memory_order_acquire which allows for data-dependent ordering. If a thread uses memory_order_consume, then it is not guaranteed to see modifications to other variables made by the thread that performed the releasing operation unless those variables are accessed in conjunction with the consumed variable. This means, for example, that member variables of an object are visible if the consumed value is a pointer to that object, but that values of independent objects are not necessarily visible. This allows the compiler to perform some optimizations that are forbidden by memory_order_acquire, and reduces the synchronization overhead on some hardware architectures.
N2678: Error Handling Specification for Chapter 30 (Threads)
This paper brings the exceptions thrown by the thread under the new system_error umbrella, with corresponding error codes and error categories.
N2669: Thread-Safety in the Standard Library (Rev 2)
Now the standard supports threads, we need to say which standard library operations are thread-safe, and which are not. This paper basically says that non-modifying operations on the same object are safe, and any operations on separate objects are also safe. Also, separate threads may call the same library functions on separate objects without problems. As you might expect, concurrent modifications to the same object are data races and undefined behaviour.

The committee also voted to include N2659: Thread-Local Storage in C++0x, but it doesn't appear to be in the current draft. This paper introduces the thread_local keyword to indicate that each thread should have its own copy of a given object.

Finally, N2657: Local and Unnamed Types as Template Arguments has been incorporated in the working paper. Though this isn't directly concurrency related, it is something I've been campaigning for since N1427 back in 2003.

Apart from N2657, I've only listed the concurrency changes: check out the Working Draft for the C++0x standard, and the State of C++ Evolution for more details on the changes.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

Condition Variable Spurious Wakes

Friday, 27 June 2008

Condition variables are a useful mechanism for waiting until an event occurs or some "condition" is satisfied. For example, in my implementation of a thread-safe queue I use a condition variable to avoid busy-waiting in wait_and_pop() when the queue is empty. However, condition variables have one "feature" which is a common source of bugs: a wait on a condition variable may return even if the condition variable has not been notified. This is called a spurious wake.

Spurious wakes cannot be predicted: they are essentially random from the user's point of view. However, they commonly occur when the thread library cannot reliably ensure that a waiting thread will not miss a notification. Since a missed notification would render the condition variable useless, the thread library wakes the thread from its wait rather than take the risk.

Bugs due to spurious wakes

Consider the code for wait_and_pop from my thread-safe queue:

    void wait_and_pop(Data& popped_value)
    {
        boost::mutex::scoped_lock lock(the_mutex);
        while(the_queue.empty())
        {
            the_condition_variable.wait(lock);
        }
        
        popped_value=the_queue.front();
        the_queue.pop();
    }

If we know that there's only one consumer thread, it would be tempting to write this with an if instead of a while, on the assumption that there's only one thread waiting, so if it's been notified, the queue must not be empty:

    if(the_queue.empty()) // Danger, Will Robinson
    {
        the_condition_variable.wait(lock);
    }

With the potential of spurious wakes this is not safe: the wait might finish even if the condition variable was not notified. We therefore need the while, which has the added benefit of allowing multiple consumer threads: we don't need to worry that another thread might remove the last item from the queue, since we're checking to see if the queue is empty before proceeding.

That's the beginner's bug, and one that's easily overcome with a simple rule: always check your predicate in a loop when waiting with a condition variable. The more insidious bug comes from timed_wait().

Timing is everything

condition_variable::wait() has a companion function that allows the user to specify a time limit on how long they're willing to wait: condition_variable::timed_wait(). This function comes as a pair of overloads: one that takes an absolute time, and one that takes a duration. The absolute time overload will return once the clock reaches the specified time, whether or not it was notified. The duration overload will return once the specified duration has elapsed: if you say to wait for 3 seconds, it will stop waiting after 3 seconds. The insidious bug comes from the overload that takes a duration.

Suppose we wanted to add a timed_wait_and_pop() function to our queue, that allowed the user to specify a duration to wait. We might be tempted to write it as:

    template<typename Duration>
    bool timed_wait_and_pop(Data& popped_value,
                            Duration const& timeout)
    {
        boost::mutex::scoped_lock lock(the_mutex);
        while(the_queue.empty())
        {
            if(!the_condition_variable.timed_wait(lock,timeout))
                return false;
        }
        
        popped_value=the_queue.front();
        the_queue.pop();
        return true;
    }

At first glance this looks fine: we're handling spurious wakes by looping on the timed_wait() call, and we're passing the timeout in to that call. Unfortunately, the timeout is a duration, so every call to timed_wait() will wait up to the specified amount of time. If the timeout was 1 second, and the timed_wait() call woke due to a spurious wake after 0.9 seconds, the next time round the loop would wait for a further 1 second. In theory this could continue ad infinitum, completely defeating the purpose of using timed_wait() in the first place.

The solution is simple: use the absolute time overload instead. By specifying a particular clock time as the timeout, the remaining wait time decreases with each call. This requires that we determine the final timeout prior to the loop:

    template<typename Duration>
    bool timed_wait_and_pop(Data& popped_value,
                            Duration const& wait_duration)
    {
        boost::system_time const timeout=boost::get_system_time()+wait_duration;

        boost::mutex::scoped_lock lock(the_mutex);
        while(the_queue.empty())
        {
            if(!the_condition_variable.timed_wait(lock,timeout))
                return false;
        }
        
        popped_value=the_queue.front();
        the_queue.pop();
        return true;
    }

Though this solves the problem, it's easy to make the mistake. Thankfully, there is a better way to wait that doesn't suffer from this problem: pass the predicate to the condition variable.

Passing the predicate to the condition variable

Both wait() and timed_wait() come with additional overloads that allow the user to specify the condition being waited for as a predicate. These overloads encapsulate the while loops from the examples above, and ensure that spurious wakes are correctly handled. All that is required is that the condition being waited for can be checked by means of a simple function call or a function object which is passed as an additional parameter to the wait() or timed_wait() call.

wait_and_pop() can therefore be written like this:


    struct queue_not_empty
    {
        std::queue<Data>& queue;

        queue_not_empty(std::queue<Data>& queue_):
            queue(queue_)
        {}
        bool operator()() const
        {
            return !queue.empty();
        }
    };

    void wait_and_pop(Data& popped_value)
    {
        boost::mutex::scoped_lock lock(the_mutex);
        the_condition_variable.wait(lock,queue_not_empty(the_queue));
        popped_value=the_queue.front();
        the_queue.pop();
    }

and timed_wait_and_pop() can be written like this:

    template<typename Duration>
    bool timed_wait_and_pop(Data& popped_value,
                            Duration const& wait_duration)
    {
        boost::mutex::scoped_lock lock(the_mutex);
        if(!the_condition_variable.timed_wait(lock,wait_duration,
            queue_not_empty(the_queue)))
            return false;
        popped_value=the_queue.front();
        the_queue.pop();
        return true;
    }

Note that what we're waiting for is the queue not to be empty — the predicate is the reverse of the condition we would put in the while loop. This will be much easier to specify when compilers implement the C++0x lambda facilities.

Conclusion

Spurious wakes can cause some unfortunate bugs, which are hard to track down due to the unpredictability of spurious wakes. These problems can be avoided by ensuring that plain wait() calls are made in a loop, and the timeout is correctly calculated for timed_wait() calls. If the predicate can be packaged as a function or function object, using the predicated overloads of wait() and timed_wait() avoids all the problems.

Posted by Anthony Williams
[/ threading /] permanent link
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

If you liked this post, why not subscribe to the RSS feed RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.

More recent entries Older entries

Design and Content Copyright © 2005-2025 Just Software Solutions Ltd. All rights reserved. | Privacy Policy