Unit Testing Parsers – Are Parsers a Special Case?

parsingunit testing

Context

While trying to get my parser classes under test, I noticed a common challenge for (unit-)testing them: They have only one public method, with a string as input and the parsed class as output. They are further divided into methods for readability, but those are all private. These classes are not huge – I would say on average we're talking about the whole class being 150 lines of code (including whitespace) with around 8 private methods.

According to this question, you shouldn't make methods public just for the sake of testing. I don't have the same problem as that question, because calling that one public method does indeed touch all the private methods.

Issue

However, I struggle a bit with the thought of re-creating the whole string input with just a slight change for every if in those private methods. Also these tests seem like integration-, rather than unit-tests to me, simply because they test too many lines of code at once – I'm not sure about this though.

Possible options I thought of

I also see some options which could be of help with that. Lets go with an example of a parser that needs an XML-formatted input string. I could use a hard-coded XML string or I could dynamically create that XML with language-given XML tools (e.g. XDocument in C#).

If we further think about a program that saves some XML-data in a file to read it later, that program has the corresponding "encoder" for that parser. I could create the XML with the encoder and decode it with the parser/decoder, effectively testing that they work together – which seems like all the program needs anyway, unless the spec says otherwise.

Question

Are parsers a special case in unit testing?
If yes, to what extent? Are their certain "dos and don'ts" to look out for which differ from "standard" testing?
Or am I maybe just writing my parsers in a very inconvenient way?

Best Answer

Parsers are not a special case, what you desribe is somewhat common. If the class in question is of a reasonable size, adheres to seperation of concerns, single-responsibility principle and so on, then you should test it through its single public method. This just means the class has a tight public surface which is a good thing.

But you should not focus on "testing every if" in the class. That would be coupling the test to the implementaion which we want to avoid. You should test all the specific requirements which the parser is supposed to satisfy.

Also these tests seem like integration-, rather than unit-tests to me, simply because they test too many lines of code at once - I'm not sure about this though.

No, the distinction between unit test and integration test is whether the test touches seperate subsystems. If it is all in single class, then it is definitely not an integration test.

If we further think about a program that saves some XML-data in a file to read it later, that program has the corresponding "encoder" for that parser. I could create the XML with the encoder and decode it with the parser/decoder, effectively testing that they work together - which seems like all the program needs anyway, unless the spec says otherwise.

It makes sense to test a "roundtrip" to ensure the components work together. This is a valuable form of integration test. But it is important that you also unittest input/output of the components individually, since otherwise you run the risk of both components having a similar bug (not unlikely if written by the same person) which whould not be discovered by the integration test.

Your test input/output samples should be at the same abstraction level as the class you are testing. E.g if the class takes strings as input, you should test it with hardcoded strings. Otherwise your input generation gets to complex and you run the risk of bugs in input generation which may in turn mask bugs in the class you are testing. Tests should be simple code.