Is there a Cormen-like reference on Hashes and Hashing? This particular structure has seen little attention in my CS education for some reason but I'd like to learn more as they seem to be everywhere. I know Cormen covers it but I'm looking for something more specialized and in-depth.
Algorithms – The Definitive Guide to Hashing
algorithmsdata structures
Related Solutions
A Helper class is a lesser known code smell where a coder has identified some miscellaneous, commonly used operations and attempted to make them reusable by lumping them together in an unnatural grouping. Successive developers have then come onto the project and not realised that the helper class exists, and have consequently rewritten the same common operations, or even created more Helper classes.
But seriously, the main problem with Helper classes is that they are usually operations that act on a specific class, which obviously means in OO terms that they are suffering from an acute case of Feature Envy. This failure to package the behaviour with the data it acts on is why developers so often (in my experience) fail to find it.
In addition to this, as you have already identified SomethingSomethingHelper is actually a terrible name. It is undescriptive, and gives you no real inkling of what sort of operations the class does (it helps?), which also means that it's not obvious when adding new behaviours whether they belong in the Helper class or not. I would break up such classes along the lines of related behaviour that logically group together, and then rename the new classes to reflect what it does.
StackOverflow would be a somewhat better place for asking programming/algorithm related questions. In any case, the implementation you must have read would be based on "tables". Here is how such implementation works:
- Initialize vector with size n, say n = 16
Address: 0xAAA0 to 0xAAB0 Memory reserved
- Insert 17 elements. First 16 inserted fine. Next element requires more memory.
STL Library: Allocate memory for 16 * 2 = 32. Copy 16 elements. (Actual time taken = 16 units). Insert the 17th element.
- Insert 16 more elements. First 15 inserted fine. Next element requires more space.
STL Library: Allocate memory for 16 * 2 * 2 = 64. Copy 32 elements. (Actual time taken = 32 units). Insert the 33rd element.
- Insert 32 more elements. First 31 inserted fine. Next requires more space. STL Library: Allocate memory for 16 * 2 * 2 * 2 = 128. Copy 64 elements. (Actual time taken = 64 units). Insert the 65th element.
This implementation is O(1) for accessing and O(1) amortized for insertion. How? Over a very large number of operations, the total time of inserts would be:
Time = 2^0 (inserts) + 2^0 (copy) + 2^1 / 2 ( inserts ) + 2^1(copy) + 2^2/2 (inserts) + 2^2 (copy) ... .. + 2^n(copy)
- Total number of inserts = 2^n Time = 2^0 + 2^0 + 2^0 + 2^1 + 2^1 + 2^2 + 2^2 = 1 + 2*2^0 + 2*2^1 +...+2*2^(n-1) = 1 + 2*(2^n - 1)
Average time per insert = 2 units
- Total inserts = 2^n + 1 Time = 2^0 + 2^0 + 2^0 + 2^1 + 2^1 + 2^2 + 2^2 + 2^3 ... = 1 + 2*2^0 + 2*2^1 +...+2*2^(n-1) + 2^n = 1 + 2*(2^n - 1) + 2^n
Average time per insert = 3 units
Its not a linked list- but I'm pretty this is what you read: 'increasing/decreasing' sizes. Decrease in size upon deletes is similar. When used size is less than 1/4, free up the rest of memory, free up half of memory. If you're using your own memory allocator, it shouldn't be too hard to free only what you want. But if you want to copy over as well, analysis would tell you deletes still remain O(1)
Best Answer
I really enjoyed the book File Organization and Processing. Despite it's name, it's just a book of data structures. The first half is about hashing and various collision resolution methods, and later on there is coverage of some dynamic hashing algorithms.
It's a little old but it's still useful. There are step-by-step examples for each algorithm and answers to the exercises.
Disclaimer: I'm biased because the author was one of my CS professors.