DISCLAIMER: This answer was written in 2008.
Since then, PHP has given us password_hash
and password_verify
and, since their introduction, they are the recommended password hashing & checking method.
The theory of the answer is still a good read though.
TL;DR
Don'ts
- Don't limit what characters users can enter for passwords. Only idiots do this.
- Don't limit the length of a password. If your users want a sentence with supercalifragilisticexpialidocious in it, don't prevent them from using it.
- Don't strip or escape HTML and special characters in the password.
- Never store your user's password in plain-text.
- Never email a password to your user except when they have lost theirs, and you sent a temporary one.
- Never, ever log passwords in any manner.
- Never hash passwords with SHA1 or MD5 or even SHA256! Modern crackers can exceed 60 and 180 billion hashes/second (respectively).
- Don't mix bcrypt and with the raw output of hash(), either use hex output or base64_encode it. (This applies to any input that may have a rogue
\0
in it, which can seriously weaken security.)
Dos
- Use scrypt when you can; bcrypt if you cannot.
- Use PBKDF2 if you cannot use either bcrypt or scrypt, with SHA2 hashes.
- Reset everyone's passwords when the database is compromised.
- Implement a reasonable 8-10 character minimum length, plus require at least 1 upper case letter, 1 lower case letter, a number, and a symbol. This will improve the entropy of the password, in turn making it harder to crack. (See the "What makes a good password?" section for some debate.)
Why hash passwords anyway?
The objective behind hashing passwords is simple: preventing malicious access to user accounts by compromising the database. So the goal of password hashing is to deter a hacker or cracker by costing them too much time or money to calculate the plain-text passwords. And time/cost are the best deterrents in your arsenal.
Another reason that you want a good, robust hash on a user accounts is to give you enough time to change all the passwords in the system. If your database is compromised you will need enough time to at least lock the system down, if not change every password in the database.
Jeremiah Grossman, CTO of Whitehat Security, stated on White Hat Security blog after a recent password recovery that required brute-force breaking of his password protection:
Interestingly, in living out this nightmare, I learned A LOT I didn’t know about password cracking, storage, and complexity. I’ve come to appreciate why password storage is ever so much more important than password complexity. If you don’t know how your password is stored, then all you really can depend upon is complexity. This might be common knowledge to password and crypto pros, but for the average InfoSec or Web Security expert, I highly doubt it.
(Emphasis mine.)
What makes a good password anyway?
Entropy. (Not that I fully subscribe to Randall's viewpoint.)
In short, entropy is how much variation is within the password. When a password is only lowercase roman letters, that's only 26 characters. That isn't much variation. Alpha-numeric passwords are better, with 36 characters. But allowing upper and lower case, with symbols, is roughly 96 characters. That's a lot better than just letters. One problem is, to make our passwords memorable we insert patterns—which reduces entropy. Oops!
Password entropy is approximated easily. Using the full range of ascii characters (roughly 96 typeable characters) yields an entropy of 6.6 per character, which at 8 characters for a password is still too low (52.679 bits of entropy) for future security. But the good news is: longer passwords, and passwords with unicode characters, really increase the entropy of a password and make it harder to crack.
There's a longer discussion of password entropy on the Crypto StackExchange site. A good Google search will also turn up a lot of results.
In the comments I talked with @popnoodles, who pointed out that enforcing a password policy of X length with X many letters, numbers, symbols, etc, can actually reduce entropy by making the password scheme more predictable. I do agree. Randomess, as truly random as possible, is always the safest but least memorable solution.
So far as I've been able to tell, making the world's best password is a Catch-22. Either its not memorable, too predictable, too short, too many unicode characters (hard to type on a Windows/Mobile device), too long, etc. No password is truly good enough for our purposes, so we must protect them as though they were in Fort Knox.
Best practices
Bcrypt and scrypt are the current best practices. Scrypt will be better than bcrypt in time, but it hasn't seen adoption as a standard by Linux/Unix or by webservers, and hasn't had in-depth reviews of its algorithm posted yet. But still, the future of the algorithm does look promising. If you are working with Ruby there is an scrypt gem that will help you out, and Node.js now has its own scrypt package. You can use Scrypt in PHP either via the Scrypt extension or the Libsodium extension (both are available in PECL).
I highly suggest reading the documentation for the crypt function if you want to understand how to use bcrypt, or finding yourself a good wrapper or use something like PHPASS for a more legacy implementation. I recommend a minimum of 12 rounds of bcrypt, if not 15 to 18.
I changed my mind about using bcrypt when I learned that bcrypt only uses blowfish's key schedule, with a variable cost mechanism. The latter lets you increase the cost to brute-force a password by increasing blowfish's already expensive key schedule.
Average practices
I almost can't imagine this situation anymore. PHPASS supports PHP 3.0.18 through 5.3, so it is usable on almost every installation imaginable—and should be used if you don't know for certain that your environment supports bcrypt.
But suppose that you cannot use bcrypt or PHPASS at all. What then?
Try an implementation of PDKBF2 with the maximum number of rounds that your environment/application/user-perception can tolerate. The lowest number I'd recommend is 2500 rounds. Also, make sure to use hash_hmac() if it is available to make the operation harder to reproduce.
Future Practices
Coming in PHP 5.5 is a full password protection library that abstracts away any pains of working with bcrypt. While most of us are stuck with PHP 5.2 and 5.3 in most common environments, especially shared hosts, @ircmaxell has built a compatibility layer for the coming API that is backward compatible to PHP 5.3.7.
Cryptography Recap & Disclaimer
The computational power required to actually crack a hashed password doesn't exist. The only way for computers to "crack" a password is to recreate it and simulate the hashing algorithm used to secure it. The speed of the hash is linearly related to its ability to be brute-forced. Worse still, most hash algorithms can be easily parallelized to perform even faster. This is why costly schemes like bcrypt and scrypt are so important.
You cannot possibly foresee all threats or avenues of attack, and so you must make your best effort to protect your users up front. If you do not, then you might even miss the fact that you were attacked until it's too late... and you're liable. To avoid that situation, act paranoid to begin with. Attack your own software (internally) and attempt to steal user credentials, or modify other user's accounts or access their data. If you don't test the security of your system, then you cannot blame anyone but yourself.
Lastly: I am not a cryptographer. Whatever I've said is my opinion, but I happen to think it's based on good ol' common sense ... and lots of reading. Remember, be as paranoid as possible, make things as hard to intrude as possible, and then, if you are still worried, contact a white-hat hacker or cryptographer to see what they say about your code/system.
A public salt will not make dictionary attacks harder when cracking a single password. As you've pointed out, the attacker has access to both the hashed password and the salt, so when running the dictionary attack, she can simply use the known salt when attempting to crack the password.
A public salt does two things: makes it more time-consuming to crack a large list of passwords, and makes it infeasible to use a rainbow table.
To understand the first one, imagine a single password file that contains hundreds of usernames and passwords. Without a salt, I could compute "md5(attempt[0])", and then scan through the file to see if that hash shows up anywhere. If salts are present, then I have to compute "md5(salt[a] . attempt[0])", compare against entry A, then "md5(salt[b] . attempt[0])", compare against entry B, etc. Now I have n
times as much work to do, where n
is the number of usernames and passwords contained in the file.
To understand the second one, you have to understand what a rainbow table is. A rainbow table is a large list of pre-computed hashes for commonly-used passwords. Imagine again the password file without salts. All I have to do is go through each line of the file, pull out the hashed password, and look it up in the rainbow table. I never have to compute a single hash. If the look-up is considerably faster than the hash function (which it probably is), this will considerably speed up cracking the file.
But if the password file is salted, then the rainbow table would have to contain "salt . password" pre-hashed. If the salt is sufficiently random, this is very unlikely. I'll probably have things like "hello" and "foobar" and "qwerty" in my list of commonly-used, pre-hashed passwords (the rainbow table), but I'm not going to have things like "jX95psDZhello" or "LPgB0sdgxfoobar" or "dZVUABJtqwerty" pre-computed. That would make the rainbow table prohibitively large.
So, the salt reduces the attacker back to one-computation-per-row-per-attempt, which, when coupled with a sufficiently long, sufficiently random password, is (generally speaking) uncrackable.
Best Answer
The short answer to your question is: SHA-1 is as secure as you can get. MD5 would be fine too, even MD4; but it could make some investors nervous. For public relations, it is best to use a "better" hash function, e.g. SHA-256, even if you truncate its output to 160 or 128 bits (to save on storage cost). Some of the SHA-3 round-2 candidates appear to be faster than SHA-1 while being arguably "more secure"; yet they are still a bit new, so sticking to SHA-256 or SHA-512 would be a safer route right now. It would make you look professional and cautious, which is good.
Note that "as secure as you can get" is not the same as "perfectly safe". See below for rather lengthy explanations.
About known attacks:
The known attacks on MD4, MD5 and SHA-1 are about collisions, which do not impact preimage resistance. It has been shown that MD4 has a few weaknesses which can be (only theoretically) exploited when trying to break HMAC/MD4, but this does not apply to your problem. The 2106 second preimage attack in the paper by Kesley and Schneier is a generic trade-off which applies only to very long inputs (260 bytes; that's a million terabytes -- notice how 106+60 exceeds 160; that's where you see that the trade-off has nothing magic in it).
The rest of this message assumes that the hash function you use (e.g. SHA-1) is a "black box" with no special property that the attacker may use. That's what you have right now even with the "broken" hash functions MD5 and SHA-1.
About rainbow tables:
The "rainbow attack" is actually cost-sharing of a dictionary or brute force attack. It is a derivative from the time-memory trade-off first described by Hellman in 1980. Assuming that you have N possible passwords (that's the size of your dictionary, or 2n if you consider brute-forcing a hash function with an output of n bits), there is a time-sharing attack in which you precompute the N hashed passwords and store them in a big table. If you sort the hash outputs, you can get your password in a single lookup. A rainbow table is a smart way to store that table with a much reduced space. You store only N/t hashed passwords, and you crack passwords with O(t2) lookups. Rainbow tables allow you to virtually handle precomputed tables much larger than what you can realistically store.
However, rainbow or not, the attacker still has to run the full attack at least once. This can be viewed as several successive optimization layers:
If N is large enough that the CPU-cost of hashing N passwords is ludicrous, then such an attack is not feasible, regardless of whether rainbow tables are used or not. This means that a (preimage-resistant) hash function with an output of 80 bits or more is enough to make brute-force attack infeasible.
About salts:
Salts are a way to defeat pre-computations. In the description above, the salt brings back the attacker to step 1: salting prevents the attacker from sharing the O(N) cost between several attacked passwords. Pre-computed tables, a fortiori rainbow tables, are no longer feasible.
You want salting because when the hashed data consists in passwords, i.e. something which fits within the brain of a random human being, then N can be quite low: humans are really bad at choosing and remembering passwords. This is what "dictionary attacks" are about: that's using a reduced space of potential passwords (the "dictionary") under the assumption that many user passwords will be in that specially selected space.
Hence salting will at least prevent the attacker from using pre-computed tables, in particular pre-computed rainbow tables. This assumes that the attacker will be able to break one password or two; we do not want him to break 1000 other passwords with little extra overhead.
Also, salting is good for public relations.
About SHA-1 cost:
The elementary cost of SHA-1 is about hashing a 64-byte block. That's how SHA-1 works: data is padded, then split into 64-byte blocks. The cost of processing a single block is about 500 clock cycles on an Intel Core2 system, and that's for a single core. MD5 and MD4 are faster, count about 400 and 250 cycles, respectively. Do not forget that most modern CPU have several cores, so multiply accordingly.
Some salting schemes prescribe huge salts; e.g. what enters the hash function is actually 40000 successive copies of a single 128-bit salt, followed by the password itself. This makes password hashing more expensive (by a factor of 10000 with my example), both for the legitimate user and for the attacker. Whether this is a good idea depends on the setup. For login on a desktop system, this is good: the user will not even notice that it took 10ms to hash his password, instead of 1µs; but the cost for the attacker has risen by a very noticeable factor 10000. On shared servers with thousands of clients per second, the aggregate cost may become prohibitive. Conceptually, raising the bar by the same factor for the legitimate user and the attacker is not ultimately good security; but it can be a worthwhile idea in some specific situations.
About online attacks:
All of the above is about defeating offline attacks. An offline attack is an attack where the attacker has all the data he needs in order to "test" passwords; e.g. the attacker could get a copy of the database holding the hashed passwords. In an offline attack, the attacker is limited only by his computational resources. Conversely, an online attack is an attack where each guess by the attacker must go through an honest verifier (e.g. the attacker simply tries to log in on the attacked system). Online attacks are thwarted by enforcing limits on how many passwords can be tried per second. Extreme examples are smartcards which shut down after three wrong PINs.
Usually, for password security, it pays off much more to arrange the system for not letting an attacker build an offline attack. That's what Unix systems do: the hashed passwords, which used to be in the world-readable
/etc/password
file, are now in the/etc/shadow
file which is protected against read access, except by a few privileged applications. The assumption here is that if the attacker can read/etc/shadow
, then he probably has enough control over the system that he does not really need passwords anymore...