Legacy Code – Introducing an External BDD Integration Test Harness

bddlegacy

Here's the scenario.

  • There's a large, organically grown application written in some language and manner that makes it difficult to test. It works but is hard to maintain.
  • Specifications are woolly, so even if the code was amenable, it's hard to know what things should be doing without doing them.
  • You want to get it into a test harness so you can start to refactor and create proper unit tests.

In these cases it makes sense to build an external or superficial 'test everything' harness so you can start more aggressive refactoring to enable you to write proper unit tests. This seems a pretty established practice of incrementally introducing tests.

"The main thing that distinguishes legacy code from non-legacy code is a lack of
comprehensive tests." – Michael C. Feathers (Working Effectively with Legacy Code)

My question is this:

Does it make sense to build a suite of Behaviour Driven tests at this point and execute them via some kind of automated button clicker? I'm thinking something like (but not limited to) cucumber or similar where the human readable tests match executable code/scripts that prove the defined behaviour is being met.

A clear advantage of this is that a human readable specification of sorts could be developed alongside the tests and that these specifications are much easier to confirm as sane and correct with users/developers of the system. They are essentially integration tests but perform the function that Feathers recommends.

eta: I'm not asking for a specific tool recommendation, the actual tool is irrelevant (although it would be helpful to know if they actually exist) I'm after a sanity check and recommendations around external BDD integration testing.

Best Answer

It's perfectly possible to do this. I've written a blog with a few guidelines, but these additional ones might help you too:

  • Think about the capabilities of the system. For instance, an accounting system might have the ability to read bank feeds, raise invoices, email those invoices, etc.

  • Group the scenarios in terms of the capabilities. Look at what kind of contexts (the givens) produce what kinds of outcomes (the thens). Have some conversations with the business people about these, and pick up their language as far as possible. The capabilities themselves will drive what you write for the events (the whens).

    For instance, you might find a couple of scenarios for raising invoices where it says something like:

    Given an organisation to bill is outside the US
    When we send the invoice
    Then international bank details should be included.

    Given an organisation to bill is within the US
    When we send the invoice
    Then it should include only US bank details.

    These will then tie into the automation that does the more detailed steps that actually create organisations with different addresses in different countries, send the invoices, and verify that those invoices have been sent with the correct bank details. There will be far more automation steps than there will of these higher-level ones. This is commonly referred to as declarative vs. imperative language, and will help you to work out which scenarios are the most important to cover, and which are functionally equivalent.

    Notice that the difference between the scenarios is called out fairly cleanly here, which it wouldn't be if there were multiple UI steps hiding that difference. The difference between the scenarios is what illustrates the behaviour.

  • You are likely to find bugs. It's up to you if you want to write scenarios around what the system should do. It's highly likely though that by now there are some human workarounds, so I wouldn't worry too much about this behaviour. If the application is in the wild and producing value, it's good. Make sure you get scenarios around the core capabilities written first.

  • Whenever you have to fix a bug, write some unit tests. This will force you to redesign your code. Regression bugs are usually caused by poor design, and adding yet more scenarios will just make the code harder to change rather than give you any more confidence in it. This is what Michael Feathers is primarily referring to here. See also the test pyramid. As your system is refactored, the number of unit tests and integration tests should be rapidly outstripping those at the UI level.

  • You can use the different capabilities of the system to guide you in finding the seams which Michael talks about in his book, which will help you to refactor.

  • Note that I don't use the word test very often. Business people will tend to talk more comfortably about the behaviour of the system when you talk about examples or scenarios in which things happen than when you talk about tests. This harks back to the origins of BDD.

Good luck!