The Intel x86 Memory Ordering Guarantees and the C++ Memory Model
Tuesday, 26 August 2008
The July 2008 version of the Intel 64 and IA-32 Architecture documents includes the information from the memory ordering white paper I mentioned before. This makes it clear that on x86/x64 systems the preferred implementation of the C++0x atomic operations is as follows (which has been confirmed in discussions with Intel engineers):
Memory Ordering | Store | Load |
---|---|---|
std::memory_order_relaxed | MOV [mem],reg | MOV reg,[mem] |
std::memory_order_acquire | n/a | MOV reg,[mem] |
std::memory_order_release | MOV [mem],reg | n/a |
std::memory_order_seq_cst | XCHG [mem],reg | MOV reg,[mem] |
As you can see, plain MOV
is enough for even
sequentially-consistent loads if a LOCK
ed instruction
such as XCHG
is used for the sequentially-consistent
stores.
One thing to watch out for is the Non-Temporal SSE instructions
(MOVNTI
, MOVNTQ
, etc.), which by their
very nature (i.e. non-temporal) don't follow the normal
cache-coherency rules. Therefore non-temporal stores must be
followed by an SFENCE
instruction in order for their
results to be seen by other processors in a timely fashion.
Additionally, if you're writing drivers which deal with memory pages marked WC (Write-Combining) then additional fence instructions will be required to ensure visibility between processors. However, if you're programming with WC pages then this shouldn't be a problem.
Posted by Anthony Williams
[/ threading /] permanent link
Tags: intel, x86, c++, threading, memory ordering, memory model
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Design and Content Copyright © 2005-2024 Just Software Solutions Ltd. All rights reserved. | Privacy Policy
3 Comments
Anthony,
I knew an exceptional case "Loads do actually reordered with other loads, if store to load forwarding is involved" from following URL.
http://software.intel.com/en-us/forums/threading-on-intel-parallel-architectures/topic/62973/
Will this affect your memory_order implementation?
Hi James,
Short answer: no.
Reading a value written by your own thread doesn't provide any additional ordering, so in the code from the first post on that forum page, the read of guard0 into dummy is essentially a no-op. If the read from guard0 was tested, and the value was NOT what was written then you would know that another thread had modified the value. In the code, guard0 is not written by another thread so this cannot happen.
Do you happen to have a link to a proof that a plain mov is all that's needed for a sequentially-consistent load? I see from the memory model documents that the xchg'es all happen in a total order, and that movs won't be reordered across xchg, but I'm having trouble getting from there to sequential consistency.