Unit-testing – Are (database) integration tests bad

integration-testsrepository-patterntestingunit testing

Some people maintain that integration tests are all kinds of bad and wrong – everything must be unit-tested, which means you have to mock dependencies; an option which, for various reasons, I'm not always fond of.

I find that, in some cases, a unit-test simply doesn't prove anything.

Let's take the following (trivial, naive) repository implementation (in PHP) as an example:

class ProductRepository
{
    private $db;

    public function __construct(ConnectionInterface $db) {
        $this->db = $db;
    }

    public function findByKeyword($keyword) {
        // this might have a query builder, keyword processing, etc. - this is
        // a totally naive example just to illustrate the DB dependency, mkay?

        return $this->db->fetch("SELECT * FROM products p"
            . " WHERE p.name LIKE :keyword", ['keyword' => $keyword]);
    }
}

Let's say I want to prove in a test that this repository can actually find products matching various given keywords.

Short of integration testing with a real connection object, how can I know that this is actually generating real queries – and that those queries actually do what I think they do?

If I have to mock the connection object in a unit-test, I can only prove things like "it generates the expected query" – but that doesn't mean it's actually going to work… that is, maybe it's generating the query I expected, but maybe that query doesn't do what I think it does.

In other words, I feel like a test that makes assertions about the generated query, is essentially without value, because it's testing how the findByKeyword() method was implemented, but that doesn't prove that it actually works.

This problem isn't limited to repositories or database integration – it seems to apply in a lot of cases, where making assertions about the use of a mock (test-double) only proves how things are implemented, not whether they're going to actually work.

How do you deal with situations like these?

Are integration tests really "bad" in a case like this?

I get the point that it's better to test one thing, and I also understand why integration testing leads to myriad code-paths, all of which cannot be tested – but in the case of a service (such as a repository) whose only purpose is to interact with another component, how can you really test anything without integration testing?

Best Answer

Write the smallest useful test you can. For this particular case, an in-memory database might help with that.

It is generally true that everything that can be unit-tested should be unit-tested, and you're right that unit tests will take you only so far and no further—particularly when writing simple wrappers around complex external services.

A common way of thinking about testing is as a testing pyramid. It's a concept frequently connected with Agile, and many have written about it, including Martin Fowler (who attributes it to Mike Cohn in Succeeding with Agile), Alistair Scott, and the Google Testing Blog.

        /\                           --------------
       /  \        UI / End-to-End    \          /
      /----\                           \--------/
     /      \     Integration/System    \      /
    /--------\                           \----/
   /          \          Unit             \  /
  --------------                           \/
  Pyramid (good)                   Ice cream cone (bad)

The notion is that fast-running, resilient unit tests are the foundation of the testing process. There should be more focused unit tests than system/integration tests, and more system/integration tests than end-to-end tests. As you get closer to the top, tests tend to take more time/resources to run, tend to be subject to more brittleness and flakiness, and are less-specific in identifying which system or file is broken; naturally, it's preferable to avoid being "top-heavy".

To that point, integration tests aren't bad, but heavy reliance on them may indicate that you haven't designed your individual components to be easy to test. Remember, the goal here is to test that your unit is performing to its spec while involving a minimum of other breakable systems: You may want to try an in-memory database (which I count as a unit-test-friendly test double alongside mocks) for heavy edge-case testing, for instance, and then write a couple of integration tests with the real database engine to establish that the main cases work when the system is assembled.

As you noted, it's possible for tests to be too narrow: you mentioned that the mocks you write simply test how something is implemented, not whether it works. That's something of an antipattern: A test that is a perfect mirror of its implementation isn't really testing anything at all. Instead, test that every class or method behaves according to its own spec, at whatever level of abstraction or realism that requires.

In that sense your method's spec might be one of the following:

  1. Issue some arbitrary SQL or RPC and return the results exactly (mock-friendly, but doesn't actually test the query you care about)
  2. Issue exactly the SQL query or RPC and return the results exactly (mock-friendly, but brittle, and assumes SQL is OK without testing it)
  3. Issue an SQL command to a similar database engine and check that it returns the right results (in-memory-database-friendly, probably the best solution on balance)
  4. Issue an SQL command to a staging copy of your exact DB engine and check that it returns the right results (probably a good integration test, but may be prone to infrastructure flakiness or difficult-to-pinpoint errors)
  5. Issue an SQL command to your real production DB engine and check that it returns the right results (may be useful to check deployed behavior, same issues as #4 plus the dangers of modifying production data or overwhelming your server)

Use your judgment: Pick the quickest and most resilient solution that will fail when you need it to and give you confidence that your solution is correct.

Related Topic