Unit Testing – How to Unit-Test a Class with Realistic Data

unit testing

I have a class that encapsulates the results of a scientific measurement. I’m building in unit tests from the beginning, but I don’t have a lot of experience with unit testing and I’m not sure which behaviors I should be testing and how.

My class does three kinds of things:

  1. Reads measurement data from a file (or a string) into its instance variables
  2. Writes its measurement data out to a file or a string
  3. Performs calculations on its data (e.g. getting the average of a set of numbers)

My approach right now is to include a known-good example data file in my test directory. One test reads the data in from the file, passes it to my class, and makes sure that it meets some basic sanity checks. Another test passes the file’s filename to my class, lets the class read it, and runs the same tests. The remainder of the tests read the data in from the file, pass it to my class, and check that the results of the data-processing methods are correct, given what I know about that data set.

This seems pretty tangled, though. The tests that check (3) implicitly assume that the behaviors of (1) are correct, since it’s the functions in (1) that are being used to populate the class in the first place. And the tests of (1) could benefit from the extensive checks done by the tests for (3). Am I structuring my unit tests poorly, or is this just a natural result of the fact that I need to use a specific dataset in my tests?

Best Answer

What you're doing is integration testing, because as you can probably tell, your tests have dependencies on other parts of your code. This is fine, but it's good to know when you're looking at articles/examples/etc. online.

Some points to consider, and things to remember:

  • Arrange, Act, Assert. All tests have these 3 steps.
    • Need more steps? You're probably testing too much for 1 test
    • No Arrange? You have external dependencies, which almost always makes tests flaky and unreliable.
    • Lots of Arrange code usually means lots of dependencies, and system-state types of dependencies. That's a recipe for fragile code.
    • No Act? Often tests the compiler. Leave that to compiler developers.
    • Lots of Act(s)? You're probably testing too much for 1 test again.
    • No Assert? Often the case when you want to test that "no errors occurred". No errors != working correctly.
    • Lots of Asserts? The code under test might be doing too much/touching too many systems. Try refactoring and testing the individual bits instead.
  • Tests should exist on their own. Avoid scenarios where something that tests 3 depends on the code that does 1. Instead, look to replace the code that does 1 with test code. In your scenario: try manually setting the values to what you want them to be in the test's Arrange step, then perform the Act ion (the calculations), then Assert the result.
  • Happy-path tests are often a waste of time. Best-case scenarios almost always work. Your tests should be trying to force errors and edge cases.
    • Do you have a test that passes bad streams to be processed?
    • Do you have a test that passes null/non-existant filenames to be processed?
    • What happens if the numeric values to compute are not numeric, or when parsed are huge, or produce huge values when computed?
  • Surprisingly writing tests rarely confirms that what you've done is right. Instead they give you insight into how usable, stable, and flexible your design is.

edit> This has already been accepted, but I wanted to add something that I learned long ago via Pragmatic Unit Testing:

Unit Testing with your Right BICEP

  • Are the results Right ?
  • CORRECT B oundary Conditions
    • C onform to expected format
    • O rdered correctly
    • R ange is correct
    • R eferences to external dependencies are safe
    • E xistence (null etc.)
    • C ardinality
    • T iming (things happen in the right order, with the correct timing)
  • I nverse relationships
  • C ross-Check results with other means
  • E rrors are forced
  • P erformance characteristics are within acceptable bounds
Related Topic