Your code has significant other issues apart from just that. Manually deleting a pointer? Calling a cleanup
function? Owch. Also, as accurately pointed out in the question comment, you don't use RAII for your lock, which is another fairly epic fail and guarantees that when DoSomethingImportant
throws an exception, terrible things happen.
The fact that this multithreaded bug is occurring is just a symptom of the core problem- your code has extremely bad semantics in any threading situation and you're using completely unreliable tools and ex-idioms. If I were you, I'd be amazed that it functions with a single thread, let alone more.
Common Mistake #3 - Even though the objects are reference counted, the
shutdown sequence "releases" it's pointer. But forgets to wait for the
thread that is still running to release it's instance. As such,
components are shutdown cleanly, then spurious or late callbacks are
invoked on an object in an state not expecting any more calls.
The whole point of reference counting is that the thread has already released it's instance. Because if not, then it cannot be destroyed because the thread still has a reference.
Use std::shared_ptr
. When all threads have released (and nobody, therefore, can be calling the function, as they have no pointer to it), then the destructor is called. This is guaranteed safe.
Secondly, use a real threading library, like Intel's Thread Building Blocks or Microsoft's Parallel Patterns Library. Writing your own is time-consuming and unreliable and your code is full of threading details which it doesn't need. Doing your own locks is just as bad as doing your own memory management. They have already implemented many general-purpose very useful threading idioms which work correctly for your use.
I have wrestled with this. I did a few iterations on my design...
First solution: Just have a hash table with a global lock.
Second solution: Wait free fixed size hash table. We need to figure out that it is possible to implement a wait free thread safe hash set with fixed size. To do this, we are going to use an array, where the index is the hash, we will use linear probing, and all operations are going to be interlocked.
So, yes, we will have locking. However a thread will never be spin waiting or waiting on a wait handle. It takes some work to figure out how to guarantee they are all atomic.
Third solution: Lock on growth. The naive approach is to lock the structure, create a new array, copy all the data, then unlock. That works, it is thread safe. We can get away with locking only on growth because as long as the size does not change we can work as in the prior solution.
Except I do not like the down time. There are threads waiting for the copy to complete. We can do better...
Fourth solution: Cooperative growth. First implement a thread safe state machine so that the structure can change from normal use to growth and back. Next, every thread that wants to do an operation, instead of waiting, will cooperate in copying data to the new array. How? We can have an index that we increment, again with an interlocked operation... each thread increments the index, takes the element, computes where to place it in the new array, writes it there and then loops until the operation is completed.
There is a problem: it does not shrink.
Fifth solution: Sparse array. Instead of using an array, we can use a tree of arrays, and use it like a sparse array. Its size will be enough to allocate every value that our hash function can output. Now, threads can nodes as needed. We will keep track of usage of each node, that is how many children are occupied and how many threads are currently using it. When usage reaches zero, we can remove it... wait... locking? turns out we can do an optimistic solution: we keep to references to the node data, every thread will write one after operation. The thread that found 0 will erase, then read (another thread could have written it), and then copy to the second reference. All interlocked operations.
By the way, those atomic operations? Yeah, it turns out they are some common patterns we can abstract out. At the end, I only have DoMayIncrement
for operation that may or may not add items, DoMayDecrement
for those that may or may not remove, and Do
for everything else. They will take callbacks with references to the values, which the callback code does not need to know exactly where they are stored, and we build on top of that more specific operations and add probing to handle collisions along the way.
Oh, I forgot, to minimize allocations I pool the arrays that make up the nodes of the tree.
Sixth solution: Phil Bagwell's Ideal Hash Trees. It is similar to my fifth solution. Main difference is that the tree does not behave like a sparse array where all values are inserted at the same depth. Instead, the depth of tree branches is dynamic. When a collision happens, the old node is replaced with a branch where the old node is reinserted in addition to the new node. This should make Bagwell’s solution more memory efficient on the average case. Bagwell also pools arrays. Bagwell does not elaborate on shrinking.
The fifth solution is what I currently use to backport/polyfill ConcurrentDictionary<TKey, TValue>
to .NET 2.0. More precisely, my backport of ConcurrentDictionary<TKey, TValue>
wraps around a custom type ThreadSafeDictionary<TKey, TValue>
which uses a Bucket<KeyValuePair<TKey, TValue>>
which implement atomic operations over a BucketCore
which is my thread-safe sparse array thingy, which has the logic for shrink and grow.
Best Answer
Impossible. It can be implemented in different ways, e.g., via the Compare-and-swap where the hardware guarantees sequential execution. It can get a bit complicated in presence of multiple cores or even multiple sockets and needs a complicated protocol between the cores, but this is all taken care of.