Garbage Collection – Can a Hash Table Solve the Stop-the-World Problem?

algorithmsgarbage-collection

In mark-sweep-compact garbage collection algorithm you have to stop-the-world when relocating objects because reference graph becomes inconsistent and you have to replace values of all references pointing to the object.

But what if you had a hash table with object ID as a key and pointer as value, and references would point to said ID instead of object address… then fixing references would only require changing one value and pause would only be needed if object is tried to be written into during copying…

Is there a mistake in my line of thought?

Best Answer

Updating references is not the only thing that requires a pause. The standard algorithms commonly grouped under "mark-sweep" all assume that the entire object graph remains unaltered while it's being marked. Correctly handling modifications (new objects created, references changed) requires rather tricky alternative algorithms, like as the tri-color algorithm. The umbrella term is "concurrent garbage collection".

But yes, updating references after compaction also needs a pause. And yes, using indirection (e.g. via a persistent object ID and a hash table to real pointers) can greatly reduce the pausing. It might even be possible to make this part lock-free if one so desires. It would still be as tricky to get right as any low-level shared-memory concurrency, but there is no fundamental reason it wouldn't work.

However, it would have severe disadvantages. Aside from taking extra space (at least two extra words for all objects), it makes every dereference much more expensive. Even something as simple as getting an attribute now involves a full hash table search. I'd estimate the performance hit to be way worse than for incremental tracing.

Related Topic