Java Hash Table Design – Simple Hash Function Implementation

hashingjava

I want to learn to Design Hash table with simple hash function for better understanding. I understand that the hash table will work as long as the hash function maps each key to a non-negative integer less than the size of the hash table, but it will only perform well if it distributes different keys across different buckets.

My question is : What's a alternative ways to implement hash function using ASCII code.

I found ASCII code hash function implementation it's easy to build a hash function on the idea of treating each character of the string as a digit in a number. I try to represent a number is to use a radix-10 system with the Arabic numerals.

For example, I could represent numbers using the letters "a" – "z" for the numbers 0 through 25 to obtain the Radix-26 system described in your text book. Characters in the computer are often stored using 7-bit ASCII codes (with values 0-127). So we can treat a string of ASCII characters as a Radix-128 number.

Best Answer

If you want to build a hash table in Java, you should take advantage of the hashCode and equals methods which every object has, so there is no need to devise a custom hash function. Note that all Java “characters” are already numbers in the range 0x00 – 0xFFFF (they are UTF-16 code units, not ASCII characters or bytes).

Your idea of having the hash code be a Base-26 or Base-128 interpretation is a good idea for alphabetic-only/ASCII-only texts. But there are a few issues I can see:

  • Strings do not only contain ASCII letters, but also symbols or spaces. Frequently, text will contain Unicode characters which have no ASCII equivalent.

  • A hash code is an integer. To find a bucket, you'd do buckets.get(hashCode % buckets.size()). However, Java integers hold 32 bits, which offers enough bits for roughly 4.5 ASCII characters. Assuming your implementation left-shifts by seven bits and “or”s the new bits to the current hash code,

    int hashCode = 0;
    for (char c : str) {
        hashCode <<= 7;
        hashCode |= c & 0x7F;
    }
    

    then only the last 5 characters would be significant. This makes it extremely easy to create hash collisions: civilisation and train station.

    This can be avoided by a cleverer hash function where any bit will stay somehow significant. E.g the bits in the hash code could be rotated rather than shifted, and new bits could be “xor”ed to the existing value:

    int hashCode = 0;
    for (char c : str) {
        // rotate the bits
        hashCode = (hashCode << 1) | (hashCode >> (32 - 1));
        // xor new bits
        hashCode ^= c;
    }
    

    Java uses a slightly different hash function:

    for (char c : str) {
        hashCode = 31 * hashCode + c;
    }
    

    The multiplication with 31 makes sure that all bits are eventually used without making any bit irrelevant. Overflow is no problem due to the modulo operation when determining a bucket. The value of 31 is largely unimportant, but it being a prime number avoids hash collisions.

Related Topic