C# – Which cryptographic hash function should I choose

ccryptographic-hash-functioncryptographyhashnet

The .NET framework ships with 6 different hashing algorithms:

MD5: 16 bytes (Time to hash 500MB: 1462 ms)
SHA-1: 20 bytes (1644 ms)
SHA256: 32 bytes (5618 ms)
SHA384: 48 bytes (3839 ms)
SHA512: 64 bytes (3820 ms)
RIPEMD: 20 bytes (7066 ms)

Each of these functions performs differently; MD5 being the fastest and RIPEMD being the slowest.

MD5 has the advantage that it fits in the built-in Guid type; and it is the basis of the type 3 UUID. SHA-1 hash is the basis of type 5 UUID. Which makes them really easy to use for identification.

MD5 however is vulnerable to collision attacks, SHA-1 is also vulnerable but to a lesser degree.

Under what conditions should I use which hashing algorithm?

Particular questions I'm really curious to see answered are:

Is MD5 not to be trusted? Under normal situations when you use the MD5 algorithm with no malicious intent and no third party has any malicious intent would you expect ANY collisions (meaning two arbitrary byte[] producing the same hash)
How much better is RIPEMD than SHA1? (if its any better) its 5 times slower to compute but the hash size is the same as SHA1.
What are the odds of getting non-malicious collisions when hashing file-names (or other short strings)? (Eg. 2 random file-names with same MD5 hash) (with MD5 / SHA1 / SHA2xx) In general what are the odds for non-malicious collisions?

This is the benchmark I used:

    static void TimeAction(string description, int iterations, Action func) {
        var watch = new Stopwatch();
        watch.Start();
        for (int i = 0; i < iterations; i++) {
            func();
        }
        watch.Stop();
        Console.Write(description);
        Console.WriteLine(" Time Elapsed {0} ms", watch.ElapsedMilliseconds);
    }

    static byte[] GetRandomBytes(int count) {
        var bytes = new byte[count];
        (new Random()).NextBytes(bytes);
        return bytes;
    }
    

    static void Main(string[] args) {

        var md5 = new MD5CryptoServiceProvider();
        var sha1 = new SHA1CryptoServiceProvider();
        var sha256 = new SHA256CryptoServiceProvider();
        var sha384 = new SHA384CryptoServiceProvider();
        var sha512 = new SHA512CryptoServiceProvider();
        var ripemd160 = new RIPEMD160Managed();

        var source = GetRandomBytes(1000 * 1024);

        var algorithms = new Dictionary<string,HashAlgorithm>();
        algorithms["md5"] = md5;
        algorithms["sha1"] = sha1;
        algorithms["sha256"] = sha256;
        algorithms["sha384"] = sha384;
        algorithms["sha512"] = sha512;
        algorithms["ripemd160"] = ripemd160;

        foreach (var pair in algorithms) {
            Console.WriteLine("Hash Length for {0} is {1}", 
                pair.Key, 
                pair.Value.ComputeHash(source).Length);
        }

        foreach (var pair in algorithms) {
            TimeAction(pair.Key + " calculation", 500, () =>
            {
                pair.Value.ComputeHash(source);
            });
        }

        Console.ReadKey();
    }

Best Answer

In cryptography, hash functions provide three separate functions.

Collision resistance: How hard is it for someone to find two messages (any two messages) that hash the same.
Preimage Resistance: Given a hash, how hard is it to find another message that hashes the same? Also known as a one way hash function.
Second preimage resistance: Given a message, find another message that hashes the same.

These properties are related but independent. For example, collision resistance implies second preimage resistance, but not the other way around. For any given application, you will have different requirements, needing one or more of these properties. A hash function for securing passwords on a server will usually only require preimage resistance, while message digests require all three.

It has been shown that MD5 is not collision resistant, however, that does not preclude its use in applications that do not require collision resistance. Indeed, MD5 is often still used in applications where the smaller key size and speed are beneficial. That said, due to its flaws, researchers recommend the use of other hash functions in new scenarios.

SHA1 has a flaw that allows collisions to be found in theoretically far less than the 2^80 steps a secure hash function of its length would require. The attack is continually being revised and currently can be done in ~2^63 steps - just barely within the current realm of computability. For this reason NIST is phasing out the use of SHA1, stating that the SHA2 family should be used after 2010.

SHA2 is a new family of hash functions created following SHA1. Currently there are no known attacks against SHA2 functions. SHA256, 384 and 512 are all part of the SHA2 family, just using different key lengths.

RIPEMD I can't comment too much on, except to note that it isn't as commonly used as the SHA families, and so has not been scrutinized as closely by cryptographic researchers. For that reason alone I would recommend the use of SHA functions over it. In the implementation you are using it seems quite slow as well, which makes it less useful.

In conclusion, there is no one best function - it all depends on what you need it for. Be mindful of the flaws with each and you will be best able to choose the right hash function for your scenario.

TL;DR

Don'ts

Don't limit what characters users can enter for passwords. Only idiots do this.
Don't limit the length of a password. If your users want a sentence with supercalifragilisticexpialidocious in it, don't prevent them from using it.
Don't strip or escape HTML and special characters in the password.
Never store your user's password in plain-text.
Never email a password to your user except when they have lost theirs, and you sent a temporary one.
Never, ever log passwords in any manner.
Never hash passwords with SHA1 or MD5 or even SHA256! Modern crackers can exceed 60 and 180 billion hashes/second (respectively).
Don't mix bcrypt and with the raw output of hash(), either use hex output or base64_encode it. (This applies to any input that may have a rogue \0 in it, which can seriously weaken security.)

Dos

Use scrypt when you can; bcrypt if you cannot.
Use PBKDF2 if you cannot use either bcrypt or scrypt, with SHA2 hashes.
Reset everyone's passwords when the database is compromised.
Implement a reasonable 8-10 character minimum length, plus require at least 1 upper case letter, 1 lower case letter, a number, and a symbol. This will improve the entropy of the password, in turn making it harder to crack. (See the "What makes a good password?" section for some debate.)

Why hash passwords anyway?

The objective behind hashing passwords is simple: preventing malicious access to user accounts by compromising the database. So the goal of password hashing is to deter a hacker or cracker by costing them too much time or money to calculate the plain-text passwords. And time/cost are the best deterrents in your arsenal.

Another reason that you want a good, robust hash on a user accounts is to give you enough time to change all the passwords in the system. If your database is compromised you will need enough time to at least lock the system down, if not change every password in the database.

Jeremiah Grossman, CTO of Whitehat Security, stated on White Hat Security blog after a recent password recovery that required brute-force breaking of his password protection:

Interestingly, in living out this nightmare, I learned A LOT I didn’t know about password cracking, storage, and complexity. I’ve come to appreciate why password storage is ever so much more important than password complexity. If you don’t know how your password is stored, then all you really can depend upon is complexity. This might be common knowledge to password and crypto pros, but for the average InfoSec or Web Security expert, I highly doubt it.

(Emphasis mine.)

What makes a good password anyway?

Entropy. (Not that I fully subscribe to Randall's viewpoint.)

In short, entropy is how much variation is within the password. When a password is only lowercase roman letters, that's only 26 characters. That isn't much variation. Alpha-numeric passwords are better, with 36 characters. But allowing upper and lower case, with symbols, is roughly 96 characters. That's a lot better than just letters. One problem is, to make our passwords memorable we insert patterns—which reduces entropy. Oops!

Password entropy is approximated easily. Using the full range of ascii characters (roughly 96 typeable characters) yields an entropy of 6.6 per character, which at 8 characters for a password is still too low (52.679 bits of entropy) for future security. But the good news is: longer passwords, and passwords with unicode characters, really increase the entropy of a password and make it harder to crack.

There's a longer discussion of password entropy on the Crypto StackExchange site. A good Google search will also turn up a lot of results.

In the comments I talked with @popnoodles, who pointed out that enforcing a password policy of X length with X many letters, numbers, symbols, etc, can actually reduce entropy by making the password scheme more predictable. I do agree. Randomess, as truly random as possible, is always the safest but least memorable solution.

So far as I've been able to tell, making the world's best password is a Catch-22. Either its not memorable, too predictable, too short, too many unicode characters (hard to type on a Windows/Mobile device), too long, etc. No password is truly good enough for our purposes, so we must protect them as though they were in Fort Knox.

Best practices

Bcrypt and scrypt are the current best practices. Scrypt will be better than bcrypt in time, but it hasn't seen adoption as a standard by Linux/Unix or by webservers, and hasn't had in-depth reviews of its algorithm posted yet. But still, the future of the algorithm does look promising. If you are working with Ruby there is an scrypt gem that will help you out, and Node.js now has its own scrypt package. You can use Scrypt in PHP either via the Scrypt extension or the Libsodium extension (both are available in PECL).

I highly suggest reading the documentation for the crypt function if you want to understand how to use bcrypt, or finding yourself a good wrapper or use something like PHPASS for a more legacy implementation. I recommend a minimum of 12 rounds of bcrypt, if not 15 to 18.

I changed my mind about using bcrypt when I learned that bcrypt only uses blowfish's key schedule, with a variable cost mechanism. The latter lets you increase the cost to brute-force a password by increasing blowfish's already expensive key schedule.

Average practices

I almost can't imagine this situation anymore. PHPASS supports PHP 3.0.18 through 5.3, so it is usable on almost every installation imaginable—and should be used if you don't know for certain that your environment supports bcrypt.

But suppose that you cannot use bcrypt or PHPASS at all. What then?

Try an implementation of PDKBF2 with the maximum number of rounds that your environment/application/user-perception can tolerate. The lowest number I'd recommend is 2500 rounds. Also, make sure to use hash_hmac() if it is available to make the operation harder to reproduce.

Future Practices

Coming in PHP 5.5 is a full password protection library that abstracts away any pains of working with bcrypt. While most of us are stuck with PHP 5.2 and 5.3 in most common environments, especially shared hosts, @ircmaxell has built a compatibility layer for the coming API that is backward compatible to PHP 5.3.7.

Cryptography Recap & Disclaimer

The computational power required to actually crack a hashed password doesn't exist. The only way for computers to "crack" a password is to recreate it and simulate the hashing algorithm used to secure it. The speed of the hash is linearly related to its ability to be brute-forced. Worse still, most hash algorithms can be easily parallelized to perform even faster. This is why costly schemes like bcrypt and scrypt are so important.

You cannot possibly foresee all threats or avenues of attack, and so you must make your best effort to protect your users up front. If you do not, then you might even miss the fact that you were attacked until it's too late... and you're liable. To avoid that situation, act paranoid to begin with. Attack your own software (internally) and attempt to steal user credentials, or modify other user's accounts or access their data. If you don't test the security of your system, then you cannot blame anyone but yourself.

Lastly: I am not a cryptographer. Whatever I've said is my opinion, but I happen to think it's based on good ol' common sense ... and lots of reading. Remember, be as paranoid as possible, make things as hard to intrude as possible, and then, if you are still worried, contact a white-hat hacker or cryptographer to see what they say about your code/system.