Is it possible to speed up a hash table by using binary search trees for separate chaining

algorithmsbinary treedata structureshashing

I want to implement a Hash Table using Binary Search Trees to reduce the search complexity in the Separate Chaining process from O(n) (using linked list) to O(log n) (using BST). Can this be done, and if yes then how? It would be easier to understand if solution is step by step, implementation of the logic.

I want to reduce the Search time in the hashtable (build using separate chaining), but at the same time I do not want the insert time to increase. For my project i cannot change the hash function to reduce collisions. But due to scalability, collision are happening. I am trying to find a work around, so that i can somehow work with the best access and insert time in case a collision occurs… i.e. to manage the current state of thing than to restructure the entire algorithm. If it does not pan out then will have to restructure. So any ideas?

Best Answer

What you are asking for is possible given your constraints.

Analysis

A hash table's strength is its fast lookup and insertion speed. To get that speed, one must forsake any semblance of order in the table: i.e. entries are all jumbled up. A list is acceptable to use as a table entry because while traversal is O(n), the lists tend to be short assuming the hash table is sufficiently large and the objects stored in the table are hashed using a good quality hashing algorithm.

A binary search tree (BST) has fast insertion and lookup at O(log2 n). It also imposes a restriction on the elements it stores: there must be some way to order the elements. Given two elements A and B stored in the tree, it must be possible to determine if A comes before B or if they have equivalent order.

A hash table imposes no such restriction: elements in a hash table must have two properties. First, there must be a way to determine if they are equivalent; second, there must be a way to calculate a deterministic hash code. Order is not a requirement.

If your hash table elements do have an order, then you can use a BST as a hash table entry to hold objects with the same hash code (collisions). However, due to a BST having O(log2 n) lookup and insertion, that means the worst case for the whole structure (hash table plus BST) is technically better than using a list as a table entry. Depending on the BST implementation it will require more storage than a list, but likely not much more.

Please note that normally the overhead and behavior of a BST brings nothing to the table in real world situations as hash table buckets, which is why the theoretical poor performance of a list is acceptable. In other words, the hash table compensates for the list's weakness by placing fewer items in each list (bucket). However: the problem specifically stated that the hash table cannot increase in size, and collisions are more frequent than is typical in a hash table.

Implementation

I am not going to put code here because honestly it is not really necessary and you did not give a language anyway.

What I would do is simply copy whatever standard hash table your language's standard library contains into a new class, then change the table bucket type from a list to a tree. Depending on the language and its standard library this may be a very trivial thing to do.

Normally I would not advocate copy and paste coding like this. However, it is an easy way to get a battle-tested data structure very quickly.