Unit-testing of inherently random/non-deterministic algorithms

algorithmsrandomunit testing

My current project, succinctly, involves the creation of "constrainably-random events". I'm basically generating a schedule of inspections. Some of them are based on strict schedule constraints; you perform an inspection once a week on Friday at 10:00AM. Other inspections are "random"; there are basic configurable requirements such as "an inspection must occur 3 times per week", "the inspection must happen between the hours of 9AM-9PM", and "there should not be two inspections within the same 8-hour period", but within whatever constraints were configured for a particular set of inspections, the resulting dates and times should not be predictable.

Unit tests and TDD, IMO, have great value in this system as they can be used to incrementally build it while its full set of requirements is still incomplete, and make sure I'm not "over-engineering" it to do things I don't currently know I need. The strict schedules were a piece-o'-cake to TDD. However, I'm finding it difficult to really define what I'm testing when I write tests for the random portion of the system. I can assert that all times produced by the scheduler must fall within the constraints, but I could implement an algorithm that passes all such tests without the actual times being very "random". In fact that's exactly what's happened; I found an issue where the times, though not predictable exactly, fell into a small subset of the allowable date/time ranges. The algorithm still passed all assertions I felt I could reasonably make, and I could not design an automated test that would fail in that situation, but pass when given "more random" results. I had to demonstrate the problem was solved by restructuring some existing tests to repeat themselves a number of times, and visually check that the times generated fell within the full allowable range.

Does anyone have any tips for designing tests that should expect non-deterministic behavior?


Thanks to all for the suggestions. The main opinion seems to be that I need a deterministic test in order to get deterministic, repeatable, assertable results. Makes sense.

I created a set of "sandbox" tests that contain candidate algorithms for the constraining process (the process by which a byte array that could be any long becomes a long between a min and max). I then run that code through a FOR loop that gives the algorithm several known byte arrays (values from 1 to 10,000,000 just to start) and has the algorithm constrain each to a value between 1009 and 7919 (I'm using prime numbers to ensure an algorithm wouldn't pass by some serendipitous GCF between the input and output ranges). The resulting constrained values are counted and a histogram produced. To "pass", all inputs must be reflected within the histogram (sanity to ensure we didn't "lose" any), and the difference between any two buckets in the histogram cannot be greater than 2 (it should really be <= 1, but stay tuned). The winning algorithm, if any, can be cut and pasted directly into production code and a permanent test put in place for regression.

Here's the code:

    private void TestConstraintAlgorithm(int min, int max, Func<byte[], long, long, long> constraintAlgorithm)
    {
        var histogram = new int[max-min+1];
        for (int i = 1; i <= 10000000; i++)
        {
            //This is the stand-in for the PRNG; produces a known byte array
            var buffer = BitConverter.GetBytes((long)i);

            long result = constraintAlgorithm(buffer, min, max);

            histogram[result - min]++;
        }

        var minCount = -1;
        var maxCount = -1;
        var total = 0;
        for (int i = 0; i < histogram.Length; i++)
        {
            Console.WriteLine("{0}: {1}".FormatWith(i + min, histogram[i]));
            if (minCount == -1 || minCount > histogram[i])
                minCount = histogram[i];
            if (maxCount == -1 || maxCount < histogram[i])
                maxCount = histogram[i];
            total += histogram[i];
        }

        Assert.AreEqual(10000000, total);
        Assert.LessOrEqual(maxCount - minCount, 2);
    }

    [Test, Explicit("sandbox, does not test production code")]
    public void TestRandomizerDistributionMSBRejection()
    {
        TestConstraintAlgorithm(1009, 7919, ConstrainByMSBRejection);
    }

    private long ConstrainByMSBRejection(byte[] buffer, long min, long max)
    {
        //Strip the sign bit (if any) off the most significant byte, before converting to long
        buffer[buffer.Length-1] &= 0x7f;
        var orig = BitConverter.ToInt64(buffer, 0);
        var result = orig;
        //Apply a bitmask to the value, removing the MSB on each loop until it falls in the range.
        var mask = long.MaxValue;
        while (result > max - min)
        {
            mask >>= 1;
            result &= mask;
        }
        result += min;

        return result;
    }

    [Test, Explicit("sandbox, does not test production code")]
    public void TestRandomizerDistributionLSBRejection()
    {
        TestConstraintAlgorithm(1009, 7919, ConstrainByLSBRejection);
    }

    private long ConstrainByLSBRejection(byte[] buffer, long min, long max)
    {
        //Strip the sign bit (if any) off the most significant byte, before converting to long
        buffer[buffer.Length - 1] &= 0x7f;
        var orig = BitConverter.ToInt64(buffer, 0);
        var result = orig;

        //Bit-shift the number 1 place to the right until it falls within the range
        while (result > max - min)
            result >>= 1;

        result += min;
        return result;
    }

    [Test, Explicit("sandbox, does not test production code")]
    public void TestRandomizerDistributionModulus()
    {
        TestConstraintAlgorithm(1009, 7919, ConstrainByModulo);
    }

    private long ConstrainByModulo(byte[] buffer, long min, long max)
    {
        buffer[buffer.Length - 1] &= 0x7f;
        var result = BitConverter.ToInt64(buffer, 0);

        //Modulo divide the value by the range to produce a value that falls within it.
        result %= max - min + 1;

        result += min;
        return result;
    }

… and here's the results:

enter image description here

LSB rejection (bit-shifting the number until it falls within the range) was TERRIBLE, for a very easy-to-explain reason; when you divide any number by 2 until it's less than a maximum, you quit as soon as it is, and for any non-trivial range, that will bias the results towards the upper third (as was seen in the detailed results of the histogram). This was exactly the behavior I saw from the finished dates; all of the times were in the afternoon, on very specific days.

MSB rejection (removing the most significant bit off the number one at a time until it's within the range) is better, but again, because you're chopping off very large numbers with each bit, it's not evenly distributed; you're unlikely to get numbers in the upper and lower ends, so you get a bias toward the middle third. That might benefit someone looking to "normalize" random data into a bell-ish curve, but a sum of two or more smaller random numbers (similar to throwing dice) would give you a more natural curve. For my purposes, it fails.

The only one that passed this test was to constrain by modulo division, which also turned out to be the fastest of the three. Modulo will, by its definition, produce as even a distribution as possible given the available inputs.

Best Answer

What you actually want to test here, I assume, is that given a specific set of results from the randomiser, the rest of your method performs correctly.

If that's what you're looking for then mock out the randomiser, to make it deterministic within the realms of the test.

I generally have mock objects for all kinds of non-deterministic or unpredictable (at the time of writing the test) data, including GUID generators and DateTime.Now.

Edit, from comments: You have to mock the PRNG (that term escaped me last night) at the lowest level possible - ie. when it generates the array of bytes, not after you turn those into Int64s. Or even at both levels, so you can test your conversion to an array of Int64 works as intended and then test separately that your conversion to an array of DateTimes works as intended. As Jonathon said, you could just do that by giving it a set seed, or you can give it the array of bytes to return.

I prefer the latter because it won't break if the framework implementation of a PRNG changes. However, one advantage to giving it the seed is that if you find a case in production that didn't work as intended, you only need to have logged one number to be able to replicate it, as opposed to the whole array.

All this said, you must remember that it's called a Pseudo Random Number Generator for a reason. There may be some bias even at that level.

Related Topic