Quality Gates – Concept and Application in Software Testing

code-qualityfunctional-testingprogramming practicestestingunit testing

We are using SonarQube for code quality testing. It tests the quality of code, and not the function of code. It has the concept of quality gates, so you can set for instance a 90% quality gate, meaning that anything over 90% quality is considered a pass.

Some folks here like this idea and have decided to apply it to functional and unit tests. After running our functional and unit tests, we check what percentage passed and promote the code to the next environment if a high enough percentage of tests pass. In order for the code to be promoted to production the percentage passed must be 100.

To me, the tests themselves are the quality gate. Tests should never fail. If tests are failing, there is a risk introduced to the entire application and it must be fixed right away.

I'm struggling to see a valid argument for requiring that only a certain percentage of functional and unit tests pass as the code travels through our different environments on route to production. Can anyone provide one?

Best Answer

A test suite should only pass if all tests pass. Otherwise, the tests become worthless. What is an important failure, what is a failure that can be ignored? The result would be that all test failures would be ignored after a while. Bad.

There is one exception to this: a test suite may contain tests that are known to fail, as the necessary functionality has yet to be implemented or the bug has yet to be fixed. Such tests are valuable because they clearly document a bug. But because their failure would not be a regression, their failure should not fail the whole test suite (on the contrary, if they start to pass that would indicate your test suite isn't up to date with your code). Ideally, your test framework has a concept of such “TODO tests”.

Quality metrics are a different beast. If a quality metric crosses a threshold, that indicates that something is probably but not necessarily ripe for a refactoring. But some “violations” may be OK in the context of that code. As long as certain code regions can be excluded from specific analysis tools, gating on that quality metric is OK. Obviously, any explicit exclusion would be a red flag in a code review and subject to extra scrutiny, but keeping such an escape hatch open for exceptional circumstances is important.

In particular, the idea of requiring an increasing quality as an artefact travels through the release pipeline is not necessarily good. Where does the necessary quality increment come from? From the devs who improve the code and re-submit a new artefact into the pipeline. Since the necessary quality metric to traverse the whole pipeline is known beforehand, submitting any artefacts without this quality is a waste of time. So why are you doing it? Likely, the stages in the pipeline provide feedback on your program which is useful before the main release. To get this feedback, you have to submit the code even when you don't have the intention of making it through the pipeline. Again, false negatives are bad. Such a workflow is unsuitable for a pipeline model, and the feedback should be available independently.

That does not mean you should give up on quality gating. But if your target is a 100% metric for a release, the current quality metric becomes a progress indicator for your project, like a burn down chart for technical debt.