C++ – Lock Free Queue — Single Producer, Multiple Consumers

atomicclock-freequeue

I am looking for a method to implement lock-free queue data structure that supports single producer, and multiple consumers. I have looked at the classic method by Maged Michael and Michael Scott (1996) but their version uses linked lists. I would like an implementation that makes use of bounded circular buffer. Something that uses atomic variables?

On a side note, I am not sure why these classic methods are designed for linked lists that require a lot of dynamic memory management. In a multi-threaded program, all memory management routines are serialized. Aren't we defeating the benefits of lock-free methods by using them in conjunction with dynamic data structures?

I am trying to code this in C/C++ using pthread library on a Intel 64-bit architecture.

Thank you,
Shirish

Best Answer

The use of a circular buffer makes a lock necessary, since blocking is needed to prevent the head from going past the tail. But otherwise the head and tail pointers can easily be updated atomically. Or in some cases the buffer can be so large that overwriting is not an issue. (in real life you will see this in automated trading systems, with circular buffers sized to hold X minutes of market data. If you are X minutes behind, you have wayyyy worse problems than overwriting your buffer).

When I implemented the MS queue in C++, I built a lock free allocator using a stack, which is very easy to implement. If I have MSQueue then at compile time I know sizeof(MSQueue::node). Then I make a stack of N buffers of the required size. The N can grow, i.e. if pop() returns null, it is easy to go ask the heap for more blocks, and these are pushed onto the stack. Outside of the possibly blocking call for more memory, this is a lock free operation.

Note that the T cannot have a non-trivial dtor. I worked on a version that did allow for non-trivial dtors, that actually worked. But I found that it was easier just to make the T a pointer to the T that I wanted, where the producer released ownership, and the consumer acquired ownership. This of course requires that the T itself is allocated using lockfree methods, but the same allocator I made with the stack works here as well.

In any case the point of lock-free programming is not that the data structures themselves are slower. The points are this:

  1. lock free makes me independent of the scheduler. Lock-based programming depends on the scheduler to make sure that the holders of a lock are running so that they can release the lock. This is what causes "priority inversion" On Linux there are some lock attributes to make sure this happens
  2. If I am independent of the scheduler, the OS has a far easier time managing timeslices, and I get far less context switching
  3. it is easier to write correct multithreaded programs using lockfree methods since I dont have to worry about deadlock , livelock, scheduling, syncronization, etc This is espcially true with shared memory implementations, where a process could die while holding a lock in shared memory, and there is no way to release the lock
  4. lock free methods are far easier to scale. In fact, I have implemented lock free methods using messaging over a network. Distributed locks like this are a nightmare

That said, there are many cases where lock-based methods are preferable and/or required

  1. when updating things that are expensive or impossible to copy. Most lock free methods use some sort of versioning, i.e. make a copy of the object, update it, and check if the shared version is still the same as when you copied it, then make the current version you update version. Els ecopy it again, apply the update, and check again. Keep doing this until it works. This is fine when the objects are small, but it they are large, or contain file handles, etc then not recommended
  2. Most types are impossible to access in a lock free way, e.g. any STL container. These have invariants that require non atomic access , for example assert(vector.size()==vector.end()-vector.begin()). So if you are updating/reading a vector that is shared, you have to lock it.
Related Topic