C++ atomic operations for lock-free structures

atomicclock-free

I'm implementing a lock-free mechanism using atomic (double) compare and swap instructions e.g. cmpxchg16b

I'm currently writing this in assembly and then linking it in. However, I wondered if there was a way of getting the compiler to do this for me automatically? e.g. surround code block with 'atomically' and have it go figure it out how to implement the code as an atomic instruction in the underlying processor architecture (or generate an error at compile time if the underlying arch does not support it)?

P.S. I know that gcc has some built-ins (at least for CAS)

http://gcc.gnu.org/onlinedocs/gcc-4.4.0/gcc/Atomic-Builtins.html#Atomic-Builtins

Best Answer

Already kindof answered here.

The C++0x standard will provide some atomic datatypes, mainly integer and void types using std::atomic<> template. That article mentions Boehm's atomic_ops project which you can download and use today.

If not, can't you implement your assembler inline in the compiler? I know MSVC has the __asm keyword for inline assembler routines. Google says yes, gcc can do it too.

Related Solutions

Ios – What’s the difference between the atomic and nonatomic attributes

The last two are identical; "atomic" is the default behavior (~~note that it is not actually a keyword; it is specified only by the absence of nonatomic~~ -- atomic was added as a keyword in recent versions of llvm/clang).

Assuming that you are @synthesizing the method implementations, atomic vs. non-atomic changes the generated code. If you are writing your own setter/getters, atomic/nonatomic/retain/assign/copy are merely advisory. (Note: @synthesize is now the default behavior in recent versions of LLVM. There is also no need to declare instance variables; they will be synthesized automatically, too, and will have an _ prepended to their name to prevent accidental direct access).

With "atomic", the synthesized setter/getter will ensure that a whole value is always returned from the getter or set by the setter, regardless of setter activity on any other thread. That is, if thread A is in the middle of the getter while thread B calls the setter, an actual viable value -- an autoreleased object, most likely -- will be returned to the caller in A.

In nonatomic, no such guarantees are made. Thus, nonatomic is considerably faster than "atomic".

What "atomic" does not do is make any guarantees about thread safety. If thread A is calling the getter simultaneously with thread B and C calling the setter with different values, thread A may get any one of the three values returned -- the one prior to any setters being called or either of the values passed into the setters in B and C. Likewise, the object may end up with the value from B or C, no way to tell.

Ensuring data integrity -- one of the primary challenges of multi-threaded programming -- is achieved by other means.

Adding to this:

atomicity of a single property also cannot guarantee thread safety when multiple dependent properties are in play.

Consider:

 @property(atomic, copy) NSString *firstName;
 @property(atomic, copy) NSString *lastName;
 @property(readonly, atomic, copy) NSString *fullName;

In this case, thread A could be renaming the object by calling setFirstName: and then calling setLastName:. In the meantime, thread B may call fullName in between thread A's two calls and will receive the new first name coupled with the old last name.

To address this, you need a transactional model. I.e. some other kind of synchronization and/or exclusion that allows one to exclude access to fullName while the dependent properties are being updated.

C++ – atomic swap with CAS (using gcc sync builtins)

The operation might not actually store the new value into the destination because of a race with another thread that changes the value at the same moment you're trying to. The CAS primitive doesn't guarantee that the write occurs - only that the write occurs if the value is already what's expected. The primitive can't know what the correct behavior is if the value isn't what is expected, so nothing happens in that case - you need to fix up the problem by checking the return value to see if the operation worked.

So, your example:

elem->next = __sync_val_compare_and_swap(&head, head, elem); //always inserts?

won't necessarily insert the new element. If another thread inserts an element at the same moment, there's a race condition that might cause this thread's call to __sync_val_compare_and_swap() to not update head (but neither this thread's or the other thread's element is lost yet if you handle it correctly).

But, there's another problem with that line of code - even if head did get updated, there's a brief moment of time where head points to the inserted element, but that element's next pointer hasn't been updated to point to the previous head of the list. If another thread swoops in during that moment and tries to walk the list, bad things happen.

To correctly update the list change that line of code to something like:

whatever_t* prev_head = NULL;
do {
    elem->next = head;  // set up `elem->head` so the list will still be linked 
                        // correctly the instant the element is inserted
    prev_head = __sync_val_compare_and_swap(&head, elem->next, elem);
} while (prev_head != elem->next);

Or use the bool variant, which I think is a bit more convenient:

do {
    elem->next = head;  // set up `elem->head` so the list will still be linked 
                        // correctly the instant the element is inserted
} while (!__sync_bool_compare_and_swap(&head, elem->next, elem));

It's kind of ugly, and I hope I got it right (it's easy to get tripped up in the details of thread-safe code). It should be wrapped in an insert_element() function (or even better, use an appropriate library).

Addressing the ABA problem:

I don't think the ABA problem is relevant to this "add an element to the head of a list" code. Let's say that a thread wants to add object X to the list and when it executes elem->next = head, head has value A1.

Then before the __sync_val_compare_and_swap() is executed, another set of threads comes along and:

removes A1 from the list, making head point to B
does whatever with object A1 and frees it
allocates another object, A2 that happens to to be at the same address as A1 was
adds A2 to the list so that head now points to A2

Since A1 and A2 have the same identifier/address, this is an instance of the ABA problem.

However, it doesn't matter in this case since the thread adding object X doesn't care that the head points to a different object than it started out with - all it cares about is that when X is queued:

the list is consistent,
no objects on the list have been lost, and
no objects other than X have been added to the list (by this thread)

Best Answer

Related Solutions

Ios – What’s the difference between the atomic and nonatomic attributes

C++ – atomic swap with CAS (using gcc sync builtins)

Related Topic