What levels to test on with BDD/Cucumber

bddcucumber

I'm currently getting into BDD/Cucumber and I ask myself: On which levels is it good to use?

There are the test levels (from the testing pyramid):

  • User Interface
  • Integration
  • Unit

You can apply tests on various levels and fields in the application architecture:

  • Presentation layer
    • Rules (how are business rules reflected in the UI?)
    • Workflow (how does the user get where he wants?)
    • Technical (how does the UI aid the user in getting where he wants?)
  • Service layer
    • Rules
    • Workflow (e.g. session handling)
    • Technical
  • Data access layer
    • Service gateways
    • Persistence

My current impression is:

Cucumber/BDD doesn't restrict WHAT you test with it. But best it is used for business rules and workflow. To test them you would mostly want to use unit and integration tests. UI tests can also be run with BDD, there are use cases out there. But my impression is that business rule tests are better not done on the UI, instead you can keep with just testing workflow (e.g. view transitions) and the technical level (e.g. data transfer and interactions between view and presentation logic or services) there, otherwise you would have to test business logic on both, UI and integration/unit level.

So why do I have this question here? There are contradictions in what I read so far…

  • I thought Cucumber/BDD is really just for testing FEATURES (so basically business logic?) and not for stuff like "when I click here, and i enter X, and I click OK" (workflow)?
  • I also read a lot that BDD is "of course" meant for end2end tests (my impression is that it doesn't have to)
  • On the other side I made quite bad experience with end2end tests (as a consequence I think you should just use them for a very small amount of UI test cases, basically sanity/smoke tests, see Google testing blog article about that topic)

So you can see this is about the basic questions of how a test suite is organized and architectured with BDD. What is your way to do it? It'd be a great guideline for my own implementation. What do you use BDD/Cucumber for and how? (Testing levels, architecture levels)

Best Answer

If you can answer the question, "Can you give me an example of how X should behave?" then you can do BDD.

So, if you want to do BDD at a unit level, think of some examples of how that class or unit of code behaves, then describe that example using code. You don't have to use Cucumber for that. I find it's enough to just put the Given / When / Then in comments, like this.

If the audience for the scenarios or examples is non-technical, it might be worth using a natural-language tool like Cucumber. This is typically done end-to-end through the UI, but doesn't have to be. Aslak Hellesøy, the creator of Cucumber, and various others like Konstantin Kudryashov have worked with various different flavours of stack* (seen here with Josh Chisholm), trading off confidence for speed.

You can also just write a DSL or use a DSL framework like Serenity BDD, rather than using natural-language tooling.

Testing - for instance, integration or exploratory testing - is a whole other kettle of fish. I commonly see an anti-pattern in which people have perfectly fine testing practices but get confused because it doesn't seem to fit with BDD. BDD isn't really about testing. At its heart, it's a collaborative analysis practice, which happens to produce tests as a nice by-product.

I find it more useful to think of the scenarios as "living documentation". If you're keeping your codebase clean and well-designed then scenarios or unit tests will hardly ever pick up bugs. They'll help people work out what the code should do, though, which means coders are less likely to write bugs in the first place.

*(The video is of "sub-second TDD"; BDD started as a replacement for TDD with "behaviour" a more useful word than "test", since TDD isn't really about testing either.)

Typically I'll have some examples of how the system provides value to the customer or user, written using something like Cucumber, together with class-level unit tests that provide examples of how each bit of code behaves.

For instance, I might have a couple of Cucumber scenarios showing how validation helps the user fill in the form, but the exhaustive rules will be in unit tests.

If you find that you've got too many scenarios, push some down into unit tests. Also consider whether there are areas of code which have stabilized and could be extracted out into libraries or services, with their respective scenarios running over that library/service's API, thus getting them out of your build.

And, yes, we try to avoid button clicks etc. in our scenarios. This helps to keep the code maintainable (also see the Page Object Pattern) and also provides options as to how the problem will be solved. Clicking buttons etc. describes a chosen solution, while scenarios should really be illustrating the problem.

A note for Aslak's Honeycomb pattern:

Aslak describes a pattern in which user interfaces sit outside the honeycomb (web browsers, logs, apps, etc.) with the domain layer sitting inside the inner honeycomb. Glue in the outer honeycomb allows the UIs to interact with the domain layer. He describes it:

"Honeycomb" is a software architecture pattern that makes it possible to run acceptance tests in many different modes with different tradeoffs: Speed or confidence. One layer or just some.

(Note that the honeycomb shape with its multiple sides is reflective of BDD's "outside-in", in which we recognize that there are multiple user interfaces.)

In the honeycomb pattern, we use the same tools for full-stack scenarios going through the UI (the browser, for instance) and through the domain... and, in fact, we use the same scenarios! We just choose which ones to run through the full stack, and which to run through lower layers.

Note that there are actually more layers than just the domain layer; you could for instance choose to run with an in-memory database but the full web browser; a real database but directly linked to APIs; etc..

So we'd use e.g.: Cucumber for all of them.

This is different to doing BDD at a class-level (as an equivalent to TDD).

Within the domain layer itself, there will be classes with individual responsibilities (single responsibility principle).

By thinking of how that class should behave, and providing examples of that behaviour in action, we help to drive out good design and show the value of that class, providing living documentation and keeping the code clean... which is exactly what TDD does! Except I didn't use the word "test" there.

BDD at a class level is just TDD, without the word "test".

There are lots of tools which help to do this, like RSpec and MSpec. Alternatively, I might just think of my code this way, and use a TDD framework anyway. I sometimes write the "Given, When, Then" in comments (this one's in Kotlin):

@Test
fun `should handle multiple different date formats`() {

    // Given a number of different types of date format
    val input = listOf(listOf("2017-04-03 08:25"), listOf("01 Jan 2017 01:27 PM"), listOf("8/9/2016 8:15 AM"))

    // When we parse the format of the dates
    val formatters = input.map {DateFormatParser()(it)}
    val results = input.zip(formatters, {i, f -> i.map {LocalDateTime.parse(it, f)}})


    // Then it should be able to resolve those dates.
    val expectedResults = listOf(
            LocalDateTime.of(2017,4,3,8,25),
            LocalDateTime.of(2017,1,1,13,27),
            LocalDateTime.of(2016,9,8,8,15))

    Assert.assertEquals(expectedResults, results.flatMap { it })
}

JBehave (the first BDD framework) actually started as a replacement for JUnit 3.X with this way of thinking in mind. Java's annotations didn't exist back then, so JUnit looked for methods starting with "test..." instead. JBehave did exactly the same thing, but looked for classes starting with "should...".

You can read more about the history of BDD and JBehave in Dan North's introduction and in my potted history post; you can also see there the evolution from class-level BDD, to full-stack BDD, to the natural language tools like Cucumber that we now have!

So, it's possible to do BDD at a class level too. Typically we'd use different tools. They might be the same tools; I sometimes create a nice little DSL to run my scenarios with (this one's in NUnit).

Unless it's by coincidence, though (e.g.: a class just happens to be solely responsible for some interesting behaviour seen at the UI level) the full-stack scenarios and the class-level examples (unit tests) will be different.

The main reason for this is that steps are often reused across different scenarios at a full-stack level, but when you get down to classes, they should really only have one responsibility each, with the rest mocked out.