The ability to guess the next value from rand
is tied to being able to determine what srand
was called with. In particular, seeding srand
with a predetermined number results in predictable output! From the PHP interactive prompt:
[charles@charles-workstation ~]$ php -a
Interactive shell
php > srand(1024);
php > echo rand(1, 100);
97
php > echo rand(1, 100);
97
php > echo rand(1, 100);
39
php > echo rand(1, 100);
77
php > echo rand(1, 100);
93
php > srand(1024);
php > echo rand(1, 100);
97
php > echo rand(1, 100);
97
php > echo rand(1, 100);
39
php > echo rand(1, 100);
77
php > echo rand(1, 100);
93
php >
This isn't just some fluke. Most PHP versions* on most platforms** will generate the sequence 97, 97, 39, 77, 93 when srand
'd with 1024.
To be clear, this isn't a problem with PHP, this is a problem with the implementation of rand
itself. The same problem appears in other languages that use the same (or a similar) implementation, including Perl.
The trick is that any sane version of PHP will have pre-seeded srand
with an "unknown" value. Oh, but it isn't really unknown. From ext/standard/php_rand.h
:
#define GENERATE_SEED() (((long) (time(0) * getpid())) ^ ((long) (1000000.0 * php_combined_lcg(TSRMLS_C))))
So, it's some math with time()
, the PID, and the result of php_combined_lcg
, which is defined in ext/standard/lcg.c
. I'm not going to c&p here, as, well, my eyes glazed over and I decided to stop hunting.
A bit of Googling shows that other areas of PHP don't have the best randomness generation properties, and calls to php_combined_lcg
stand out here, especially this bit of analysis:
Not only does this function (gettimeofday
) hand us back a precise server timestamp on a silver platter, it also adds in LCG output if we request "more entropy" (from PHP's uniqid
).
Yeah that uniqid
. It seems that the value of php_combined_lcg
is what we see when we look at the resulting hex digits after calling uniqid
with the second argument set to a true value.
Now, where were we?
Oh yes. srand
.
So, if the code you're trying to predict random values from doesn't call srand
, you're going to need to determine the value provided by php_combined_lcg
, which you can get (indirectly?) through a call to uniqid
. With that value in hand, it's feasible to brute-force the rest of the value -- time()
, the PID and some math. The linked security issue is about breaking sessions, but the same technique would work here. Again, from the article:
Here's a summary of the attack steps outlined above:
- wait for the server to reboot
- fetch a uniqid value
- brute force the RNG seed from this
- poll the online status to wait for target to appear
- interleave status polls with uniqid polls to keep track of current server time and RNG value
- brute force session ID against server using the time and RNG value interval established in polling
Just replace that last step as required.
(This security issue was reported in an earlier PHP version (5.3.2) than we have currently (5.3.6), so it's possible that the behavior of uniqid
and/or php_combined_lcg
has changed, so this specific technique might not be workable any longer. YMMV.)
On the other hand, if the code you're trying to product calls srand
manually, then unless they're using something many times better than the result of php_combined_lcg
, you're probably going to have a much easier time guessing the value and seeding your local generator with the right number. Most people that would manually call srand
also wouldn't realize how horrible of an idea this is, and thus aren't likely to use better values.
It's worth noting that mt_rand
is also afflicted by the same problem. Seeding mt_srand
with a known value will also produce predictable results. Basing your entropy off of openssl_random_pseudo_bytes
is probably a safer bet.
tl;dr: For best results, don't seed the PHP random number generator, and for goodness' sake, don't expose uniqid
to users. Doing either or both of these may cause your random numbers to be more guessable.
Update for PHP 7:
PHP 7.0 introduces random_bytes
and random_int
as core functions. They use the underlying system's CSPRNG implementation, making them free from the problems that a seeded random number generator has. They're effectively similar to openssl_random_pseudo_bytes
, only without needing an extension to be installed. A polyfill is available for PHP5.
*: The Suhosin security patch changes the behavior of rand
and mt_rand
such that they always re-seed with every call. Suhosin is provided by a third party. Some Linux distributions include it in their official PHP packages by default, while others make it an option, and others ignore it entirely.
**: Depending on the platform and the underlying library calls being used, different sequences will be generated than documented here, but the results should still be repeatable unless the Suhosin patch is used.
This is more suited on security.stackexchange but...
The problem with
hash1(hash2(hash3(...hashn(pass+salt)+salt)+salt)...)+salt)
is that this is only as strong as the weakest hash function in the chain. For example if hashn (the innermost hash) gives a collision, the entire hash chain will give a collision (irrespective of what other hashes are in the chain).
A stronger chain would be
hash1(hash2(hash3(...hashn(pass + salt) + pass + salt) + pass + salt)...) + pass + salt)
Here we avoid the early collision problem and we essentially generate a salt that depends on the password for the final hash.
And if one step in the chain collides it doesn't matter because in the next step the password is used again and should give a different result for different passwords.
Best Answer
There are three properties one wants from every cryptographic hash function
H
:preimage resistance: Given
h
, it should be hard to find any valuex
withh = H(x)
.second preimage resistance: Given
x1
, it should be hard to findx2 != x1
withH(x1) = H(x2)
.collision resistance: It should be hard to find two values
x1 != x2
withH(x1) = H(x2)
.With hash functions as used in common programming languages for hash tables (of strings), usually none of these is given, they only provide for:
The three properties above are (among) the design goals for every cryptographic hash function. For some functions (like MD4, SHA-0, MD5) it is known that this failed (at least partially). The current generation (SHA-2) is assumed to be secure, and the next one ("Secure Hash Algorithm 3") is currently in the process of being standardized, after a competition.
For some uses (like password hashing and key derivation from passwords), the domain of actually used values
x
is so small that brute-forcing this space becomes feasible with normal (fast) secure hash functions, and this is when we also want:x
, it takes some minimum (preferably configurable) amount of resources to calculate the valueH(x)
.But for most other uses, this is not wanted, one wants instead:
x
, calculating the value ofH(x)
is as fast as possible (while still secure).There are some constructions (like PBKDF2 and scrypt) to create a slow hash function from a fast one by iterating it often.
For some more details, have a look at the hash tag on our sister site Cryptography Stack Exchange.