Password Security – Is It More Secure to Hash a Password Multiple Times?

hashing

I've read a few times that when storing passwords, it's good practice to 'double hash' the strings (eg. with md5 then sha1, both with salts, obviously).

I guess the first question is, "is this actually correct?" If not, then please, dismiss the rest of this question 🙂

The reason I ask is that on the face of it, I would say that this makes sense. However, when I think about it, every time a hash is rehashed (possibly with something added to it) all I can see is that there is a reduction in the upper bound on the final 'uniqueness'… that bound being related to the initial input.

Let me put it another way: we have x number of strings that, when hashed, are reduced to y possible strings. That is to say, there are collisions in the first set. Now coming from the second set to the third, is it not possible for the same thing to occur (ie. collisions in the set of all possible 'y' strings that result in the same hash in the third set)?

In my head, all I see is a 'funnel' for each hash function call, 'funneling' an infinite set of possibilities into a finite set and so on, but obviously each call is working on the finite set before it, giving us a set no larger than the input.

Maybe an example will explain my ramblings?
Take 'hash_function_a' that will give 'a' and 'b' the hash '1', and will give 'c' and 'd' the hash '2'. Using this function to store passwords, even if the password is 'a', I could use the password 'b'.

Take 'hash_function_b' that will give '1' and '2' the hash '3'. If I were to use it as a 'secondary hash' after 'hash_function_a' then even if the password is 'a' I could use 'b', 'c' or 'd'.

On top of all of that, I get that salts should be used, but they don't really change the fact that each time we are mapping 'x' inputs to 'less than x' outputs. I don't think.

Can someone please explain to me what it is that I am missing here?

Thanks!

EDIT: for what it's worth, I don't do this myself, I use bcrypt. And I'm not really concerned about whether or not it's useful for 'using up cycles' for a 'hacker'. I genuinely am just wondering whether or not the process reduces 'security' from a hash collision stand point.

Best Answer

This is more suited on security.stackexchange but...

The problem with

hash1(hash2(hash3(...hashn(pass+salt)+salt)+salt)...)+salt)

is that this is only as strong as the weakest hash function in the chain. For example if hashn (the innermost hash) gives a collision, the entire hash chain will give a collision (irrespective of what other hashes are in the chain).

A stronger chain would be

hash1(hash2(hash3(...hashn(pass + salt) + pass + salt) + pass + salt)...) + pass + salt)

Here we avoid the early collision problem and we essentially generate a salt that depends on the password for the final hash.

And if one step in the chain collides it doesn't matter because in the next step the password is used again and should give a different result for different passwords.

Related Solutions

Algorithms – Randomized Hash Function with No Collisions

This is not a hash function. It's either an encoding function or an encryption function.

Usually when I've encountered a problem of this form, it is because I have some sort of IDs that I want to "look" random. Assuming I don't care about security, I'll normally create an algorithm which works as follows:

Map my data to numbers (usually, by treating a string in a character set containing X characters as a baseX number).
Map my numbers to new numbers. Eric Lippert suggests using Multiplicative Inverses for this purpose. Skip32 encoding (AKA Skip32 encryption, but it's not secure) also works for this purpose.
Reverse the mapping in step 1.

To decode, perform the same steps in reverse.

Java – Complexity of ArrayList of LinkedHashSet

Your main question is whether you could have a collision, but it's first important to determine where you might have a collision. An ArrayList does not have collisions because it is an ordered list on top of an array. Your concern there would be more about how often it has to extend the capacity of the underlying array. You would feel this when you are reading the input only, though.

Where you might have collisions is in the LinkedHashSets. Since hash collisions are a function of the capacity of the underlying hash store, the hashing algorithm, and the data itself, there is always a possibility for a collisions, and there is no way to know if you'll have one except to run all of your data through. In the end, since these are all different strings, and the hashing algorithm for strings built into Java is pretty effective, the remaining factor to determine how many collisions you could potentially have is the capacity of the underlying hash store.

A default LinkedHashSet has an initial capacity of 16 and a load factor of 0.75. This means that once there are 12 elements in the LinkedHashSet, it will double its capacity to accommodate more elements. If we can assume that hashes are evenly distributed, a larger capacity means a lower probability of a collision. Resizing a LinkedHashSet is actually the most expensive activity. In order to resize the underlying hash store, it essentially must iterate through the current elements, rehash them, and re-insert them into the larger hash store. Your best strategy to avoid collisions and rehashing is to allocate large LinkedHashSets when you instantiate them. What "large" is depends on how much data you are inputting and how much of it will be combined. If you know the size and general shape of your data before hand, you could try to estimate a proper initial size large enough to accommodate the largest set (needed size / 0.75 to account for the load factor). This essentially becomes the old time vs. space trade-off, as larger initial sizes means more memory consumed.

Other general suggestions

StringTokenizer is deprecated, and string.split(" ") is generally recommended in its place. However, since you are getting your input from the console, I would suggest disposing of the buffered reader completely and wrapping System.in in a Scanner. Scanner has a next() method that fetches the next token, implicitly tokenizing the input for you with less effort.

Also, while you are using ArrayList and LinkedHashSet for their performance characteristics, it is probably still better to rely on their interfaces (List and Set respectively) than those specific implementations.

Third, instead of writing out all of the code to get an iterator and then looping over it, you can use the for-each construct to loop over any Iterable.

Fourth, you don't need boolean flags when you can just as easily get the same information from a null check. We can eliminate two variables right off the bat.

Here's a slightly rewritten version incorporating those suggestions:

Main.java

import java.io.*;
import java.util.*;

public class Main
{
    public static void main(String[] args) throws IOException {
        Scanner input = new Scanner(System.in);
        List<Set<String>> setOfStrings = new ArrayList<>();

        while (input.hasNext()) {
            Set<String> tokenSet = new LinkedHashSet<>();
            tokenSet.add(input.next());
            setOfStrings.add(tokenSet);
        }

        SetMerger sm = new SetMerger();
        sm.mergeSets("foo", "bar", setOfStrings);
        sm.mergeSets("bar", "baz", setOfStrings);
        System.out.println(setOfStrings.toString());
    }
}

SetMerger.java

import java.util.*;

public class SetMerger
{
    public void mergeSets(String toMerge, String fromMerge, List<Set<String>> stringSets) {
        Set<String> setToMerge = null, targetSet = null;

        for (Set<String> subset : stringSets) {
            if(subset.contains(toMerge)) {
                if(subset.contains(fromMerge)) {
                    return;
                }

                setToMerge = subset;
                break;
            }
        }

        for (Set<String> subset : stringSets) {
            if(subset.contains(fromMerge)) {
                targetSet = subset;
                break;
            }
        }

        if(setToMerge != null && targetSet != null) {
            targetSet.addAll(setToMerge);
            stringSets.remove(setToMerge);
        }
    }
}

Best Answer

Related Solutions

Algorithms – Randomized Hash Function with No Collisions

Java – Complexity of ArrayList of LinkedHashSet

Other general suggestions

Related Topic