Unit-testing – How to Unit-Test a parser of a file

ioparsingunit testing

I'm implementing a metadata parser of image files from all formats. I want to write tests for it. One trivial way to do so is to have test image files of all formats as a resources for the tests, and actually to read them as input. This approach may work but as far as I understand from unit test methodology, unit-tests shouldn't perform I/O. Is it a good way to do so or are they any alternatives?

Best Answer

I want to write tests for it.

What you are intending to test?

I want to use TDD. I'm refactoring a parser and want to test the 'parse()' method.

So the aim is to clean things up.

I would argue that refactoring legacy code isn't 100% compliant w/ TDD.

Bad code restricts testing.

More importantly - intention to clean it up (the drive - reason for changing code) differs from original intention for code to do whatever business domain stuff.


step 1

I would start with a sloppy integration test/s that covers most of the functionality.

Feed tests crude input - e.g. those 50mb resource files.
Ask only polished output and ignore internal stuff.

It's actually important - higher test abstractedness is what loosens implementation restrictions.

That will give you a safety net so you can open up code for refactoring w/o fear.

step 2

Once you have that - you are ready to actually go in & refactor.

Read the code. Start small. (good book)

Things like code formatting, removal of excess white-space, removal of too verbose variable prefixes.

Then move forward to structural changes - extract methods, interfaces, classes where needed.
And don't just divide & conquer - try combining stuff where it "makes sense" ™.

Only with decent code structure you will be able to write unit tests for isolated units of functionality.

If the integration test you started with performs well enough - I wouldn't even bother trying to build unit test network.

Either way - proper code structure will lead you to natural & easy to stub I/O seam.

Once the network of unit tests is strong enough - remove integration test/s.
Or stub the input the same way as in unit tests (sort of devalues integration test).