Java – Hash algorithms – Fastest implementations

hashjavamd5sha2

I want to know what is the best and fastest implementation of hash algorithms for Java especially MD5 and SHA-2 512 (SHA512) or 256. I want a function to get a string as an argument and return the hash as the result. Thak you.

Edit: This is for getting mapping each URL to a unique hash. Since MD5 is not that reliable in this area, I'm more interested in finding the best & fastest implementation for SHA-2 algorithms. Note that I know even SHA-2 might produce the same hash for some URLs but I can live with that.

Best Answer

First things first: speed is overrated. You should make measures before declaring that a given algorithm is "too slow". Most of the time, hash function speed makes no noticeable difference anyway. If you have qualms about security, then first select a hash function which is secure enough, and then only worry about performance.

Moreover, you want to hash "strings". A Java String is, internally, a chunk from an array of char values which represent Unicode code points (actually, Unicode 16-bit code units which encode the code points using UTF-16). A hash function takes as input a sequence of bits or bytes. So you will have to make a conversion step, e.g. str.getBytes("UTF-8"), to obtain your string as a bunch of bytes. It is likely that the conversion step will have a non-negligible cost when compared to the hashing itself.

Note: beware of URL-encoding ! In a URL, some bytes can be replaced with sequences beginning with a '%' sign; this is meant to support non-printable characters, but it can be used on "standard" characters as well (e.g., replacing 'a' with '%61'). This means that two strings which are distinct (in the String.equals() sense) may actually represent the same URL (as far as URL processing is concerned). Depending on your situation, this may or may not be an issue.

You should first try to use Java's MessageDigest API with the standard (already installed) JCE provider (i.e. you call MessageDigest.getInstance("SHA-256")), and bench the result. Theoretically, the JCE may map the call to an implementation with "native" code (written in C or assembly), which will be faster than what you can get with Java.

That being said...

sphlib is an opensource implementation of many cryptographic hash functions, in C and in Java. The code has been optimized for speed, and, in practice, the Java version turns out to be faster than what the standard JRE from Sun/Oracle offers. Use this link in case the previous link fails (the main host server is sometimes down for maintenance, as seems to be the case right now)(warning: 10 MB download). The archive also contains a report (which was presented at the second SHA-3 candidate conference in 2010) which gives some measured performance figures on several platforms, for SHA-2 and the 14 "second round" candidates for the upcoming SHA-3.

But you really should make in-situation benchmarks. For instance, effects on L1 cache can have a drastic effect on performance, and cannot be accurately predicted by taking the function code and running it in isolation.

Related Solutions

Java – What are the differences between a HashMap and a Hashtable in Java

There are several differences between HashMap and Hashtable in Java:

Hashtable is synchronized, whereas HashMap is not. This makes HashMap better for non-threaded applications, as unsynchronized Objects typically perform better than synchronized ones.
Hashtable does not allow null keys or values. HashMap allows one null key and any number of null values.
One of HashMap's subclasses is LinkedHashMap, so in the event that you'd want predictable iteration order (which is insertion order by default), you could easily swap out the HashMap for a LinkedHashMap. This wouldn't be as easy if you were using Hashtable.

Since synchronization is not an issue for you, I'd recommend HashMap. If synchronization becomes an issue, you may also look at ConcurrentHashMap.

Java – Is Java “pass-by-reference” or “pass-by-value”

Java is always pass-by-value. Unfortunately, when we deal with objects we are really dealing with object-handles called references which are passed-by-value as well. This terminology and semantics easily confuse many beginners.

It goes like this:

public static void main(String[] args) {
    Dog aDog = new Dog("Max");
    Dog oldDog = aDog;

    // we pass the object to foo
    foo(aDog);
    // aDog variable is still pointing to the "Max" dog when foo(...) returns
    aDog.getName().equals("Max"); // true
    aDog.getName().equals("Fifi"); // false
    aDog == oldDog; // true
}

public static void foo(Dog d) {
    d.getName().equals("Max"); // true
    // change d inside of foo() to point to a new Dog instance "Fifi"
    d = new Dog("Fifi");
    d.getName().equals("Fifi"); // true
}

In the example above aDog.getName() will still return "Max". The value aDog within main is not changed in the function foo with the Dog "Fifi" as the object reference is passed by value. If it were passed by reference, then the aDog.getName() in main would return "Fifi" after the call to foo.

Likewise:

public static void main(String[] args) {
    Dog aDog = new Dog("Max");
    Dog oldDog = aDog;

    foo(aDog);
    // when foo(...) returns, the name of the dog has been changed to "Fifi"
    aDog.getName().equals("Fifi"); // true
    // but it is still the same dog:
    aDog == oldDog; // true
}

public static void foo(Dog d) {
    d.getName().equals("Max"); // true
    // this changes the name of d to be "Fifi"
    d.setName("Fifi");
}

In the above example, Fifi is the dog's name after call to foo(aDog) because the object's name was set inside of foo(...). Any operations that foo performs on d are such that, for all practical purposes, they are performed on aDog, but it is not possible to change the value of the variable aDog itself.

For more information on pass by reference and pass by value, consult the following SO answer: https://stackoverflow.com/a/430958/6005228. This explains more thoroughly the semantics and history behind the two and also explains why Java and many other modern languages appear to do both in certain cases.

Best Answer

Related Solutions

Java – What are the differences between a HashMap and a Hashtable in Java

Java – Is Java “pass-by-reference” or “pass-by-value”

Related Topic