Hash Tables – Advantages of Linear Probing vs Separate Chaining

I've been brushing up on algorithms and reviewed these two methods of implementing hash tables. It seems like they largely have similar performance characteristics and memory requirements.

I can think of some disadvantages to linear probing — namely, that widening the array could be expensive (but this is done, what, 2 log N times at most? Probably not a big deal) and that managing deletions is a bit more difficult. But I'm assuming there are advantages, too, or it wouldn't be presented in textbooks as a viable method of implementation next to the more obvious implementation.

Why would you choose one over the other?

Best Answer

With linear probing (or any probing really) a deletion has to be "soft". This means you need to put in a dummy value (often called a tombstone) that won't match anything the user could search for. Or you would need to rehash every time. Rehashing when too many tombstones build up is still advised or some strategy to defrag the graveyard.

Separate chaining (each bucket is a pointer to a linked list of values) has the disadvantage that you end up searching a linked list with all cache-related issues at hand.

One other advantage of the probing method is that the values all live in the same array. This makes copy-on-write very easy by just copying only the array. If you can be assured the original is not modified by way of class invariant then a taking a snapshot is O(1) and can be done without locking.