Create your own MD5 collisions


I'm doing a presentation on MD5 collisions and I'd like to give people any idea how likely a collision is.

It would be good to have two blocks of text which hash to the same thing, and explain how many combinations of [a-zA-Z ] were needed before I hit a collision.

The obvious answer is hash every possible combination until hit two hashes the same. So how would you go about coding this. As a quick experiment I tried hashing every combination of 5 columns of [A-Z], storing this in a .net hashtable and catching the collision exception. Two problems with this – the hashtable eventually times out, and I'm pretty sure I'm going to need A LOT more characters.

Obviously this data structure is too big to handle in memory, so now I'll have to get a database involved. Also sounds like a good project to test out azure – a bit like these guys.

Can anyone point me in the direction of an efficient way of doing this?

Best Answer

These following two different 128 byte sequences hash to the same:

MD5 Hash: 79054025255fb1a26e4bc422aef54eb4

The differences below are highlighted (bold). Sorry it's kind of hard to see.

d131dd02c5e6eec4693d9a0698aff95c 2fcab58712467eab4004583eb8fb7f89 
55ad340609f4b30283e488832571415a 085125e8f7cdc99fd91dbdf280373c5b 
d8823e3156348f5bae6dacd436c919c6 dd53e2b487da03fd02396306d248cda0 
e99f33420f577ee8ce54b67080a80d1e c69821bcb6a8839396f9652b6ff72a70


d131dd02c5e6eec4693d9a0698aff95c 2fcab50712467eab4004583eb8fb7f89 
55ad340609f4b30283e4888325f1415a 085125e8f7cdc99fd91dbd7280373c5b 
d8823e3156348f5bae6dacd436c919c6 dd53e23487da03fd02396306d248cda0 
e99f33420f577ee8ce54b67080280d1e c69821bcb6a8839396f965ab6ff72a70

The visualization of the collision/block1 (Source: Links.Org)

alt text

The visualization of the collision/block2 (Source: Links.Org)

alt text