Security – Predicting the Output of PHP’s rand() Function

randomSecurity

I've read in numerous sources that the output of PHP's rand() is predictable as its a PRNG, and I mostly accept that as fact simply because I've seen it in so many places.

I'm interested in a proof-of-concept: how would I go about predicting the output of rand()? From reading this article I understand that the random number is a number returned from a list starting at a pointer (the seed) — but I can't imagine how this is predictable.

Could someone reasonably figure out what random # was generated via rand() at a given moment in time within a few thousand guesses? or even 10,000 guesses? How?

This is coming up because I saw a auth library which uses rand() to produce a token for users who have lost passwords, and I assumed this was a potential security hole. I've since replaced the method with hashing a mixture of openssl_random_pseudo_bytes(), the orignal hashed password, and microtime. After doing this I realized that if I were on the outside looking in, I'd have no idea how to guess the token even knowing it was a md5 of rand().

Best Answer

The ability to guess the next value from rand is tied to being able to determine what srand was called with. In particular, seeding srand with a predetermined number results in predictable output! From the PHP interactive prompt:

[charles@charles-workstation ~]$ php -a
Interactive shell

php > srand(1024);
php > echo rand(1, 100);
97
php > echo rand(1, 100);
97
php > echo rand(1, 100);
39
php > echo rand(1, 100);
77
php > echo rand(1, 100);
93
php > srand(1024);
php > echo rand(1, 100);
97
php > echo rand(1, 100);
97
php > echo rand(1, 100);
39
php > echo rand(1, 100);
77
php > echo rand(1, 100);
93
php > 

This isn't just some fluke. Most PHP versions* on most platforms** will generate the sequence 97, 97, 39, 77, 93 when srand'd with 1024.

To be clear, this isn't a problem with PHP, this is a problem with the implementation of rand itself. The same problem appears in other languages that use the same (or a similar) implementation, including Perl.

The trick is that any sane version of PHP will have pre-seeded srand with an "unknown" value. Oh, but it isn't really unknown. From ext/standard/php_rand.h:

#define GENERATE_SEED() (((long) (time(0) * getpid())) ^ ((long) (1000000.0 * php_combined_lcg(TSRMLS_C))))

So, it's some math with time(), the PID, and the result of php_combined_lcg, which is defined in ext/standard/lcg.c. I'm not going to c&p here, as, well, my eyes glazed over and I decided to stop hunting.

A bit of Googling shows that other areas of PHP don't have the best randomness generation properties, and calls to php_combined_lcg stand out here, especially this bit of analysis:

Not only does this function (gettimeofday) hand us back a precise server timestamp on a silver platter, it also adds in LCG output if we request "more entropy" (from PHP's uniqid).

Yeah that uniqid. It seems that the value of php_combined_lcg is what we see when we look at the resulting hex digits after calling uniqid with the second argument set to a true value.

Now, where were we?

Oh yes. srand.

So, if the code you're trying to predict random values from doesn't call srand, you're going to need to determine the value provided by php_combined_lcg, which you can get (indirectly?) through a call to uniqid. With that value in hand, it's feasible to brute-force the rest of the value -- time(), the PID and some math. The linked security issue is about breaking sessions, but the same technique would work here. Again, from the article:

Here's a summary of the attack steps outlined above:
  • wait for the server to reboot
  • fetch a uniqid value
  • brute force the RNG seed from this
  • poll the online status to wait for target to appear
  • interleave status polls with uniqid polls to keep track of current server time and RNG value
  • brute force session ID against server using the time and RNG value interval established in polling

Just replace that last step as required.

(This security issue was reported in an earlier PHP version (5.3.2) than we have currently (5.3.6), so it's possible that the behavior of uniqid and/or php_combined_lcg has changed, so this specific technique might not be workable any longer. YMMV.)

On the other hand, if the code you're trying to product calls srand manually, then unless they're using something many times better than the result of php_combined_lcg, you're probably going to have a much easier time guessing the value and seeding your local generator with the right number. Most people that would manually call srand also wouldn't realize how horrible of an idea this is, and thus aren't likely to use better values.

It's worth noting that mt_rand is also afflicted by the same problem. Seeding mt_srand with a known value will also produce predictable results. Basing your entropy off of openssl_random_pseudo_bytes is probably a safer bet.

tl;dr: For best results, don't seed the PHP random number generator, and for goodness' sake, don't expose uniqid to users. Doing either or both of these may cause your random numbers to be more guessable.


Update for PHP 7:

PHP 7.0 introduces random_bytes and random_int as core functions. They use the underlying system's CSPRNG implementation, making them free from the problems that a seeded random number generator has. They're effectively similar to openssl_random_pseudo_bytes, only without needing an extension to be installed. A polyfill is available for PHP5.


*: The Suhosin security patch changes the behavior of rand and mt_rand such that they always re-seed with every call. Suhosin is provided by a third party. Some Linux distributions include it in their official PHP packages by default, while others make it an option, and others ignore it entirely.

**: Depending on the platform and the underlying library calls being used, different sequences will be generated than documented here, but the results should still be repeatable unless the Suhosin patch is used.