TDD for batch processing: How to do it

tddtesting

I like "red/green/refactor" for RoR, etc. just fine.

My day job involves batch processing very large files from third-parties in python and other custom tools.

Churn on the attributes of these files is high, so there are a lot of fixes/enhancements applied pretty frequently.

Regression testing via a known body of test data with expected results does not exist. Closest thing is running against the last batch with new test cases hand coded in, make sure it does not blow up, then apply spot-checking and statistical tests to see if data still looks OK.

Q>> How to bring TDD principles into this kind of environment?

Best Answer

Just an FYI: Unit testing is not equivalent to TDD. TDD is a process of which unit testing is an element.

With that said, if you were looking to implement unit testing then there's a number of things you could do:

All new code/enhancements are tested

This way you don't have to go through and unit test everything that already exists, so the initial hump of implementing unit testing is much smaller.

Test individual pieces of data

Testing something that can contain large amounts of data can lead to many edge cases and gaps in the test coverage. Instead, consider the 0, 1, many option. Test a 'batch' with 0 elements, 1 element and many elements. In the case of 1 element, test the various permutations that the data for that element can be in.

From there, test the edge cases (upper bounds to the size of individual elements, and quantity of elements in the batch). If you run the tests regularly, and you have long running tests (large batches?), most test runners allow categorization so that you can run those test cases separately (nightly?).

That should give you a strong base.

Using actual data

Feeding in 'actual' previously used data like you're doing now isn't a bad idea. Just complement it with well formed test data so that you immediately know specific points of failure. On a failure to handle actual data, you can inspect the results of the batch process, produce a unit test to replicate the error, and then you're back into red/green/refactor with useful regression cases.

Related Solutions

TDD – How Small Should Baby Steps Be in Test-Driven Development?

Write the simplest code that makes the tests pass.

Neither of you did that, as far as I can see.

Baby Step 1.

Test: For the input "1,2" return sum of numbers which is 3

Make the test fail:

throw NotImplementedException();

Make the test pass:

return 3;

Baby Step 2.

Test: For the input "1,2" return sum of numbers, which is 3

Test: For the input "4,5" return sum of numbers, which is 9

Second test fails, so make it pass:

numbers = input.Split(',');
return int.Parse(numbers[0]) + int.Parse(numbers[1]);

(Way simpler than a list of if...return)

You can certainly argue Obvious Implementation in this case, but if you were talking about doing it strictly in baby steps then these are the correct steps, IMO.

The argument is that if you don't write the second test then some bright spark could come along later and "refactor" your code to read:

return input.Length; # Still satisfies the first test

And, without taking both steps, you have never made the second test go red (meaning that the test itself is suspect).

TDD Testing – How to Handle Many Permutations in Test-Driven Development

Taking a more practical approach to pdr's answer. TDD is all about software design rather than testing. You use unit tests to verify your work as you go along.

So on a unit test level you need to design the units so they can be tested in a completely deterministic fashion. You can do this by taking anything that makes the unit nondeterministic (such as a random number generator) and abstract that away. Say we have a naïve example of a method deciding if a move is good or not:

class Decider {

  public boolean decide(float input, float risk) {

      float inputRand = Math.random();
      if (inputRand > input) {
         float riskRand = Math.random();
      }
      return false;

  }

}

// The usage:
Decider d = new Decider();
d.decide(0.1337f, 0.1337f);

This method is very hard to test and the only thing you really can verify in unit tests is its bounds... but that requires a lot of tries to get to the bounds. So instead, let's abstract away the randomizing part by creating an interface and a concrete class that wraps the functionality:

public interface IRandom {

   public float random();

}

public class ConcreteRandom implements IRandom {

   public float random() {
      return Math.random();
   }

}

The Decider class now needs to use the concrete class through its abstraction, i.e. the Interface. This way of doing things is called dependency injection (the example below is an example of constructor injection, but you can do this with a setter as well):

class Decider {

  IRandom irandom;

  public Decider(IRandom irandom) { // constructor injection
      this.irandom = irandom;
  }

  public boolean decide(float input, float risk) {

      float inputRand = irandom.random();
      if (inputRand > input) {
         float riskRand = irandom.random();
      }
      return false;

  }

}

// The usage:
Decider d = new Decider(new ConcreteRandom);
d.decide(0.1337f, 0.1337f);

You might ask yourself why this "code bloat" is necessary. Well, for starters, you can now mock the behavior of the random part of the algorithm because the Decider now has a dependency that follows the IRandoms "contract". You can use a mocking framework for this, but this example is simple enough to code yourself:

class MockedRandom() implements IRandom {

    public List<Float> floats = new ArrayList<Float>();
    int pos;

   public void addFloat(float f) {
     floats.add(f);
   }

   public float random() {
      float out = floats.get(pos);
      if (pos != floats.size()) {
         pos++;
      }
      return out;
   }

}

The best part is that this can completely replace the "actual" concrete implementation. The code becomes easy to test like this:

@Before void setUp() {
  MockedRandom mRandom = new MockedRandom();

  Decider decider = new Decider(mRandom);
}

@Test
public void testDecisionWithLowInput_ShouldGiveFalse() {

  mRandom.addFloat(0f);

  assertFalse(decider.decide(0.1337f, 0.1337f));
}

@Test
public void testDecisionWithHighInputRandButLowRiskRand_ShouldGiveFalse() {

  mRandom.addFloat(1f);
  mRandom.addFloat(0f);

  assertFalse(decider.decide(0.1337f, 0.1337f));
}

@Test
public void testDecisionWithHighInputRandAndHighRiskRand_ShouldGiveTrue() {

  mRandom.addFloat(1f);
  mRandom.addFloat(1f);

  assertTrue(decider.decide(0.1337f, 0.1337f));
}

Hope this gives you ideas on how to design your application so that the permutations can be forced so you can test all the edge cases and whatnot.

Best Answer

Related Solutions

TDD – How Small Should Baby Steps Be in Test-Driven Development?

TDD Testing – How to Handle Many Permutations in Test-Driven Development

Related Topic