Testing – How to Test Randomness

randomtestingunit testing

Consider a method to randomly shuffle elements in an array. How would you write a simple yet robust unit test to make sure that this is working?

I've come up with two ideas, both of which have noticeable flaws:

Shuffle the array, then make sure its order differs from before. This sounds good, but fails if the shuffle happens to shuffle in the same order. (Improbable, but possible.)
Shuffle the array with a constant seed, and check it against the predetermined output. This relies on the random function always returning the same values given the same seed. However, this is sometimes an invalid assumption.

Consider a second function which simulates dice rolls and returns a random number. How would you test this function? How would you test that the function…

never returns a number outside the given bounds?
returns numbers in a valid distribution? (Uniform for one die, normal for large numbers of dice.)

I'm looking for answers offering insight into testing not only these examples but random elements of code in general. Are unit tests even the right solution here? If not, what sort of tests are?

Just to ease everyone's mind I'm not writing my own random number generator.

Best Answer

I don't think unit tests are the right tool for testing randomness. A unit test should call a method and test the returned value (or object state) against an expected value. The problem with testing randomness is that there isn't an expected value for most of the things you'd like to test. You can test with a given seed, but that only tests repeatability. It doesn't give you any way to measure how random the distribution is, or if it's even random at all.

Fortunately, there are a lot of statistical tests you can run, such as the Diehard Battery of Tests of Randomness. See also:

How to unit test a pseudo random number generator?
- Steve Jessop recommends that you find a tested implementation of the same RNG algorithm that you're using and compare its output with selected seeds against your own implementation.
- Greg Hewgill recommends the ENT suite of statistical tests.
- John D. Cook refers readers to his CodeProject article Simple Random Number Generation, which includes an implementation of the Kolmogorov-Smirnov test mentioned in Donald Knuth's volume 2, Seminumerical Algorithms.
- Several people recommend testing that the distribution of the numbers generated is uniform, the Chi-squared test, and testing that the mean and standard deviation are within the expected range. (Note that testing the distribution alone is not enough. [1,2,3,4,5,6,7,8] is a uniform distribution, but it's certainly not random.)
Unit Testing with functions that return random results
- Brian Genisio points out that mocking your RNG is one option for making your tests repeatable, and provides C# sample code.
- Again, several more people point to using fixed seed values for repeatability and simple tests for uniform distribution, Chi-squared, etc.
Unit Testing Randomness is a wiki article that talks about many of the challenges already touched on when trying to test that which is, by its nature, not repeatable. One interesting bit that I gleaned from it was the following:

I've seen winzip used as a tool to measure the randomness of a file of values before (obviously, the smaller it can compress the file the less random it is).

Related Solutions

Unit Testing – Testing Random/Non-Deterministic Algorithms

What you actually want to test here, I assume, is that given a specific set of results from the randomiser, the rest of your method performs correctly.

If that's what you're looking for then mock out the randomiser, to make it deterministic within the realms of the test.

I generally have mock objects for all kinds of non-deterministic or unpredictable (at the time of writing the test) data, including GUID generators and DateTime.Now.

Edit, from comments: You have to mock the PRNG (that term escaped me last night) at the lowest level possible - ie. when it generates the array of bytes, not after you turn those into Int64s. Or even at both levels, so you can test your conversion to an array of Int64 works as intended and then test separately that your conversion to an array of DateTimes works as intended. As Jonathon said, you could just do that by giving it a set seed, or you can give it the array of bytes to return.

I prefer the latter because it won't break if the framework implementation of a PRNG changes. However, one advantage to giving it the seed is that if you find a case in production that didn't work as intended, you only need to have logged one number to be able to replicate it, as opposed to the whole array.

All this said, you must remember that it's called a Pseudo Random Number Generator for a reason. There may be some bias even at that level.

Functional Programming – How Functional Languages Handle Random Numbers

You can't create a pure function called random that will give a different result every time it is called. In fact, you can't even "call" pure functions. You apply them. So you aren't missing anything, but this doesn't mean that random numbers are off-limits in functional programming. Allow me to demonstrate, I'll use Haskell syntax throughout.

Coming from an imperative background, you may initially expect random to have a type like this:

random :: () -> Integer

But this has already been ruled out because random cannot be a pure function.

Consider the idea of a value. A value is an immutable thing. It never changes and every observation that you can make about it is consistent for all time.

Clearly, random can't produce a Integer value. Instead, it produces a Integer random variable. It's type might look like this:

random :: () -> Random Integer

Except that passing an argument is completely unnecessary, functions are pure, so one random () is as good as another random (). I'll give random, from here on, this type:

random :: Random Integer

Which is all well and fine, but not very useful. You may expect to be able to write expressions like random + 42, but you can't, because it won't typecheck. You can't do anything with random variables, yet.

This raises an interesting question. What functions should exist to manipulate random variables?

This function can't exist:

bad :: Random a -> a

in any useful way, because then you could write:

badRandom :: Integer
badRandom = bad random

Which introduces an inconsistency. badRandom is supposedly a value, but it is also a random number; a contradiction.

Maybe we should add this function:

randomAdd :: Integer -> Random Integer -> Random Integer

But this just a special case of a more general pattern. You should be able to apply any function to random thing in order to get other random things like so:

randomMap :: (a -> b) -> Random a -> Random b

Instead of writing random + 42, we can now write randomMap (+42) random.

If all you had was randomMap, you wouldn't be able to combine random variables together. You couldn't write this function for instance:

randomCombine :: Random a -> Random b -> Random (a, b)

You might try to write it like this:

randomCombine a b = randomMap (\a' -> randomMap (\b' -> (a', b')) b) a

But it has the wrong type. Instead of ending up with a Random (a, b), we end up with a Random (Random (a, b))

This can be fixed by adding another function:

randomJoin :: Random (Random a) -> Random a

But, for reasons that may eventually become clear, I'm not going to do that. Instead I'm going to add this:

randomBind :: Random a -> (a -> Random b) -> Random b

It's not immediately obvious that this actually solves the problem, but it does:

randomCombine a b = randomBind a (\a' -> randomMap (\b' -> (a', b')) b)

In fact, it's possible to write randomBind in terms of randomJoin and randomMap. It's also possible to write randomJoin in terms of randomBind. But, I'll leave doing this as an exercise.

We could simplify this a little. Allow me to define this function:

randomUnit :: a -> Random a

randomUnit turns a value into a random variable. This means that we can have random variables that aren't actually random. This was always the case though; we could have done randomMap (const 4) random before. The reason defining randomUnit is a good idea is that now we can define randomMap in terms of randomUnit and randomBind:

randomMap :: (a -> b) -> Random a -> Random b
randomMap f x = randomBind x (randomUnit . f)

Ok, now we are getting somewhere. We have random variables that we can manipulate. However:

It's not obvious how we might actually implement these functions,
It's quite cumbersome.

Implementation

I'll tackle pseudo random numbers. It is possible implement these functions for real random numbers, but this answer is already getting quite long.

Essentially, the way this is going to work is that we are going to pass a seed value around everywhere. Whenever we generate a new random value, we will produce a new seed. At the end, when we're done constructing a random variable, we will want to sample from it using this function:

runRandom :: Seed -> Random a -> a

I'm going to define the Random type like this:

data Random a = Random (Seed -> (Seed, a))

Then, we just need to provide implementations of randomUnit, randomBind, runRandom and random which is quite straight-forward:

randomUnit :: a -> Random a
randomUnit x = Random (\seed -> (seed, x))

randomBind :: Random a -> (a -> Random b) -> Random b
randomBind (Random f) g =
  Random (\seed ->
    let (seed', x) = f seed
        Random g' = g x in
          g' seed')

runRandom :: Seed -> Random a -> a
runRandom seed (Random f) = (snd . f) seed

For random, I'm going to assume there's already a function of the type:

psuedoRandom :: Seed -> (Seed, Integer)

In which case random is just Random psuedoRandom.

Making things less cumbersome

Haskell has syntactic sugar to make things like this nicer on the eyes. It's called do-notation and to use it all we have to do it create an instance of Monad for Random.

instance Monad Random where
  return = randomUnit
  (>>=) = randomBind

Done. randomCombine from before could now be written:

randomCombine :: Random a -> Random b -> Random (a, b)
randomCombine a b = do
  a' <- a
  b' <- b
  return (a', b')

If I was doing this for myself, I would even go one step further than this and create an instance of Applicative. (Don't worry if this makes no sense).

instance Functor Random where
  fmap = liftM

instance Applicative Random where
  pure = return
  (<*>) = ap

Then randomCombine could be written:

randomCombine :: Random a -> Random b -> Random (a, b)
randomCombine a b = (,) <$> a <*> b

Now that we have these instances, we can use >>= instead of randomBind, join instead of randomJoin, fmap instead of randomMap, return instead of randomUnit. We also get a whole load of functions for free.

Is it worth it? You could argue, that getting to this stage, where working with random numbers isn't completely horrendous was quite difficult and long-winded. What did we get in exchange for this effort?

The most immediate reward is that we can now see exactly which parts of our program are dependent on randomness and which parts are entirely deterministic. In my experience, forcing a strict separation like this simplifies things immensely.

We've assumed so far that we just want a single sample from each random variable that we generate, but if it turns out that in the future we'd actually like to see more of the distribution, this is trivial. You can just use runRandom lots of times on the same random variable with different seeds. This is, of course, possible in imperative languages, but in this case, we can be certain that we aren't going to perform unanticipated IO every time we sample a random variable and we don't have to be careful about initializing state.