What are the pros and cons of automated Unit Tests vs automated Integration tests

automated-testsintegration-testingunit testing

Recently we have been adding automated tests to our existing java applications.

What we have

The majority of these tests are integration tests, which may cover a stack of calls like:-

HTTP post into a servlet
The servlet validates the request and calls the business layer
The business layer does a bunch of stuff via hibernate etc and updates some database tables
The servlet generates some XML, runs this through XSLT to produce response HTML.

We then verify that the servlet responded with the correct XML and that the correct rows exist in the database (our development Oracle instance). These rows are then deleted.

We also have a few smaller unit tests which check single method calls.

These tests are all run as part of our nightly (or adhoc) builds.

The Question

This seems good because we are checking the boundaries of our system: servlet request/response on one end and database on the other. If these work, then we are free to refactor or mess with anything inbetween and have some confidence that the servlet under test continues to work.

What problems are we likely to run into with this approach?

I can't see how adding a bunch more unit tests on individual classes would help. Wouldn't that make it harder to refactor as it's much more likely we will need to throw away and re-write tests?

Best Answer

Unit tests localize failures more tightly. Integration-level tests more closely correspond to user requirements and so are better predictor of delivery success. Neither of them is much good unless built and maintained, but both of them are very valuable if properly used.

(more...)

The thing with units tests is that no integration level test can exercise all the code as much as a good set of unit tests can. Yes, that can mean that you have to refactor the tests somewhat, but in general your tests shouldn't depend on the internals so much. So, lets say for example that you have a single function to get a power of two. You describe it (as a formal methods guy, I'd claim you specify it)

long pow2(int p); // returns 2^p for 0 <= p <= 30

Your test and your spec look essentially the same (this is sort of pseudo-xUnit for illustration):

assertEqual(1073741824,pow2(30);
assertEqual(1, pow2(0));
assertException(domainError, pow2(-1));
assertException(domainError, pow2(31));

Now your implementation can be a for loop with a multiple, and you can come along later and change that to a shift.

If you change the implementation so that, say, it's returning 16 bits (remember that sizeof(long) is only guaranteed to be no less than sizeof(short)) then this tests will fail quickly. An integration-level test should probably fail, but not certainly, and it's just as likely as not to fail somewhere far downstream of the computation of pow2(28).

The point is that they really test for diferent situations. If you could build sufficiently details and extensive integration tests, you might be able to get the same level of coverage and degree of fine-grained testing, but it's probably hard to do at best, and the exponential state-space explosion will defeat you. By partitioning the state space using unit tests, the number of tests you need grows much less than exponentially.

Related Solutions

The difference between integration and unit tests

The key difference, to me, is that integration tests reveal if a feature is working or is broken, since they stress the code in a scenario close to reality. They invoke one or more software methods or features and test if they act as expected.

On the opposite, a Unit test testing a single method relies on the (often wrong) assumption that the rest of the software is correctly working, because it explicitly mocks every dependency.

Hence, when a unit test for a method implementing some feature is green, it does not mean the feature is working.

Say you have a method somewhere like this:

public SomeResults DoSomething(someInput) {
  var someResult = [Do your job with someInput];
  Log.TrackTheFactYouDidYourJob();
  return someResults;
}

DoSomething is very important to your customer: it's a feature, the only thing that matters. That's why you usually write a Cucumber specification asserting it: you wish to verify and communicate the feature is working or not.

Feature: To be able to do something
  In order to do something
  As someone
  I want the system to do this thing

Scenario: A sample one
  Given this situation
  When I do something
  Then what I get is what I was expecting for

No doubt: if the test passes, you can assert you are delivering a working feature. This is what you can call Business Value.

If you want to write a unit test for DoSomething you should pretend (using some mocks) that the rest of the classes and methods are working (that is: that, all dependencies the method is using are correctly working) and assert your method is working.

In practice, you do something like:

public SomeResults DoSomething(someInput) {
  var someResult = [Do your job with someInput];
  FakeAlwaysWorkingLog.TrackTheFactYouDidYourJob(); // Using a mock Log
  return someResults;
}

You can do this with Dependency Injection, or some Factory Method or any Mock Framework or just extending the class under test.

Suppose there's a bug in Log.DoSomething(). Fortunately, the Gherkin spec will find it and your end-to-end tests will fail.

The feature won't work, because Log is broken, not because [Do your job with someInput] is not doing its job. And, by the way, [Do your job with someInput] is the sole responsibility for that method.

Also, suppose Log is used in 100 other features, in 100 other methods of 100 other classes.

Yep, 100 features will fail. But, fortunately, 100 end-to-end tests are failing as well and revealing the problem. And, yes: they are telling the truth.

It's very useful information: I know I have a broken product. It's also very confusing information: it tells me nothing about where the problem is. It communicates me the symptom, not the root cause.

Yet, DoSomething's unit test is green, because it's using a fake Log, built to never break. And, yes: it's clearly lying. It's communicating a broken feature is working. How can it be useful?

(If DoSomething()'s unit test fails, be sure: [Do your job with someInput] has some bugs.)

Suppose this is a system with a broken class:

A single bug will break several features, and several integration tests will fail.

A single bug will break several features, and several integration tests will fail

On the other hand, the same bug will break just one unit test.

The same bug will break just one unit test

Now, compare the two scenarios.

The same bug will break just one unit test.

All your features using the broken Log are red
All your unit tests are green, only the unit test for Log is red

Actually, unit tests for all modules using a broken feature are green because, by using mocks, they removed dependencies. In other words, they run in an ideal, completely fictional world. And this is the only way to isolate bugs and seek them. Unit testing means mocking. If you aren't mocking, you aren't unit testing.

The difference

Integration tests tell what's not working. But they are of no use in guessing where the problem could be.

Unit tests are the sole tests that tell you where exactly the bug is. To draw this information, they must run the method in a mocked environment, where all other dependencies are supposed to correctly work.

That's why I think that your sentence "Or is it just a unit test that spans 2 classes" is somehow displaced. A unit test should never span 2 classes.

This reply is basically a summary of what I wrote here: Unit tests lie, that's why I love them.

Python – Where do the Python unit tests go

For a file module.py, the unit test should normally be called test_module.py, following Pythonic naming conventions.

There are several commonly accepted places to put test_module.py:

In the same directory as module.py.
In ../tests/test_module.py (at the same level as the code directory).
In tests/test_module.py (one level under the code directory).

I prefer #1 for its simplicity of finding the tests and importing them. Whatever build system you're using can easily be configured to run files starting with test_. Actually, the default unittest pattern used for test discovery is test*.py.

Best Answer

Related Solutions

The difference between integration and unit tests

The difference

Python – Where do the Python unit tests go

Related Topic