Unit-testing – How to Unit-Test a parser of a file

ioparsingunit testing

I'm implementing a metadata parser of image files from all formats. I want to write tests for it. One trivial way to do so is to have test image files of all formats as a resources for the tests, and actually to read them as input. This approach may work but as far as I understand from unit test methodology, unit-tests shouldn't perform I/O. Is it a good way to do so or are they any alternatives?

Best Answer

I want to write tests for it.

What you are intending to test?

I want to use TDD. I'm refactoring a parser and want to test the 'parse()' method.

So the aim is to clean things up.

I would argue that refactoring legacy code isn't 100% compliant w/ TDD.

Bad code restricts testing.

More importantly - intention to clean it up (the drive - reason for changing code) differs from original intention for code to do whatever business domain stuff.

step 1

I would start with a sloppy integration test/s that covers most of the functionality.

Feed tests crude input - e.g. those 50mb resource files.
Ask only polished output and ignore internal stuff.

It's actually important - higher test abstractedness is what loosens implementation restrictions.

That will give you a safety net so you can open up code for refactoring w/o fear.

step 2

Once you have that - you are ready to actually go in & refactor.

Read the code. Start small. (good book)

Things like code formatting, removal of excess white-space, removal of too verbose variable prefixes.

Then move forward to structural changes - extract methods, interfaces, classes where needed.
And don't just divide & conquer - try combining stuff where it "makes sense" ™.

Only with decent code structure you will be able to write unit tests for isolated units of functionality.

If the integration test you started with performs well enough - I wouldn't even bother trying to build unit test network.

Either way - proper code structure will lead you to natural & easy to stub I/O seam.

Once the network of unit tests is strong enough - remove integration test/s.
Or stub the input the same way as in unit tests (sort of devalues integration test).

How many tests per method?

Well the theoretical and highly impractical maximum is the N-Path complexity (assume the tests all cover different ways through the code ;)). The minimum is ONE!. Per public method that is, he don't test implementation details, only external behaviors of a class (return values & calling other objects).

You quote:

*And the thought of testing each of your methods with its own test method (in a 1-1 relationship) will be laughable. *

and then ask:

So if creating a test for each method is 'laughable', how/when do you chose what you write tests for?

But i think you misunderstood the author here:

The idea of having one test method per one method in the class to test is what the author calls "laughable".

(For me at least) It's not about about 'less' it's about 'more'

So let me rephrase like i understood him:

And the thought of testing each of your methods with ONLY ONE METHOD (its own test method in a 1-1 relationship) will be laughable.

To quote your quote again:

When you realize that it's all about specifying behaviour and not writing tests, your point of view shifts.

When you practice TDD you don't think:

I have a method calculateX($a, $b); and it needs a test testCalculcateX that tests EVERYTHING about the method.

What TDD tells you is to think about what your code SHOULD DO like:

I need to calculate the bigger of two values (first test case!) but if $a is smaller than zero then it should produce an error (second test case!) and if $b is smaller than zero it should .... (third test case!) and so on.

You want to test behaviors, not just single methods without context.

That way you get a test suite that is documentation for your code and REALLY explains what it is expected to do, maybe even why :)

How do you go about deciding which piece of your code you create unit tests for?

Well everything that ends up in the repository or anywhere near production needs a test. I don't think the author of your quotes would disagree with that as i tried to state in the above.

If you don't have a test for it it gets way harder (more expensive) to change the code, especially if it's not you making the change.

TDD is a way to ensure that you have tests for EVERYTHING but as long as you WRITE the tests it's fine. Usually writing them on the same day helps since you are not going to do it later, are you? :)

Response to comments:

a decent amount of methods can't be tested within a particular context because they either depend or are dependent upon other methods

Well there are three thing those methods can call:

Public methods of other classes

We can mock out other classes so we have defined state there. We are in control of the context so thats not a problem there.

*Protected or Private methods on the same *

Anything that isn't part of the public API of a class doesn't get tested directly, usually.

You want to test behavior and not implementation and if a class does all it's work in one big public method or in many smaller protected methods that get called is implementation. You want to be able to CHANGE those protected methods WITHOUT touching your tests. Because your tests will break if your code changes change behavior! Thats what your tests are there for, to tell you when you break something :)

Public methods on the same class

That doesn't happen very often does it? And if it does like in the following example there are a few ways of handling this:

$stuff = new Stuff();
$stuff->setBla(12);
$stuff->setFoo(14);
$stuff->execute();

That the setters exist and are not part of the execute method signature is another topic ;)

What we can test here is if executes does blow up when we set the wrong values. That setBla throws an exception when you pass a string can be tested separately but if we want to test that those two allowed values (12 & 14) don't work TOGETHER (for whatever reason) than thats one test case.

If you want a "good" test suite you can, in php, maybe(!) add a @covers Stuff::execute annotation to make sure you only generate code coverage for this method and the other stuff that is just setup needs to be tested separately (again, if you want that).

So the point is: Maybe you need to create some of the surrounding world first but you should be able to write meaningful test cases that usually only span one or maybe two real functions (setters don't count here). The rest can be ether mocked away or be tested first and then relied upon (see @depends)

*Note: The question was migrated from SO and initially was about PHP/PHPUnit, thats why the sample code and references are from the php world, i think this is also applicable to other languages as phpunit doesn't differ that much from other xUnit testing frameworks.

Unit-testing – How to unit test \ use TDD methods for ETL’s and reporting projects

What I have done in the past is use Acceptance Test Driven Development. ETL code is often distributed across different stages/languages and technologies AND tightly coupled. Most ETL process are dependent on the sequence of transformations in the pipeline.

The risk in using unit test only in ETL is that it won't cover the integrations. The sequencing of transformations is an equal part to the actual transformations in many ETLs. If I am spending resources on creating an automated test suite I would make sure it covered the sequencing as well.

I would focus on TDD for each unique transformation sequence or at least include these tests in a larger test suite. If there are too many combinations you may have to pick and choose which sequences to test. The idea is to validate the ETL pipeline for the data set(s) it will be used on. As well as making sure you have test coverage on all your code.