We have found that we have several spots in our code where concurrent reads of data protected by a mutex are rather common, while writes are rare. Our measurements seem to say that using a simple mutex seriously hinders the performance of the code reading that data. So what we would need is a multiple-read/single-write mutex. I know that this can be built atop of simpler primitives, but before I try myself at this, I'd rather ask for existing knowledge:
What is an approved way to build a multiple-read/single-write lock out of simpler synchronization primitives?
I do have an idea how to make it, but I'd rather have answers unbiased by what I (probably wrongly) came up with. (Note: What I expect is an explanation how to do it, probably in pseudo code, not a full-fledged implementation. I can certainly write the code myself.)
Caveats:
-
This needs to have reasonable performance. (What I have in mind would require two lock/unlock operations per access. Now that might not be good enough, but needing many of them instead seems unreasonable.)
-
Commonly, reads are more numerous, but writes are more important and performance-sensitive than reads. Readers must not starve writers.
-
We are stuck on a rather old embedded platform (proprietary variant of VxWorks 5.5), with a rather old compiler (GCC 4.1.2), and boost 1.52 – except for most of boost's parts relying on POSIX, because POSIX isn't fully implemented on that platform. The locking primitives available basically are several kind of semaphores (binary, counting etc.), on top of which we have already created mutexes, conditions variables, and monitors.
-
This is IA32, single-core.
Best Answer
At first glance I thought I recognized this answer as the same algorithm that Alexander Terekhov introduced. But after studying it I believe that it is flawed. It is possible for two writers to simultaneously wait on
m_exclusive_cond
. When one of those writers wakes and obtains the exclusive lock, it will setexclusive_waiting_blocked = false
onunlock
, thus setting the mutex into an inconsistent state. After that, the mutex is likely hosed.N2406, which first proposed
std::shared_mutex
contains a partial implementation, which is repeated below with updated syntax.The algorithm is derived from an old newsgroup posting of Alexander Terekhov. It starves neither readers nor writers.
There are two "gates",
gate1_
andgate2_
. Readers and writers have to passgate1_
, and can get blocked in trying to do so. Once a reader gets pastgate1_
, it has read-locked the mutex. Readers can get pastgate1_
as long as there are not a maximum number of readers with ownership, and as long as a writer has not gotten pastgate1_
.Only one writer at a time can get past
gate1_
. And a writer can get pastgate1_
even if readers have ownership. But once pastgate1_
, a writer still does not have ownership. It must first get pastgate2_
. A writer can not get pastgate2_
until all readers with ownership have relinquished it. Recall that new readers can't get pastgate1_
while a writer is waiting atgate2_
. And neither can a new writer get pastgate1_
while a writer is waiting atgate2_
.The characteristic that both readers and writers are blocked at
gate1_
with (nearly) identical requirements imposed to get past it, is what makes this algorithm fair to both readers and writers, starving neither.The mutex "state" is intentionally kept in a single word so as to suggest that the partial use of atomics (as an optimization) for certain state changes is a possibility (i.e. for an uncontended "fast path"). However that optimization is not demonstrated here. One example would be if a writer thread could atomically change
state_
from 0 towrite_entered
then he obtains the lock without having to block or even lock/unlockmut_
. Andunlock()
could be implemented with an atomic store. Etc. These optimizations are not shown herein because they are much harder to implement correctly than this simple description makes it sound.