Unit Testing – Is Measuring Method Performance by Timeout Effective?

performanceproject-managementRequirementsunit testing

In a project where there are non-functional requirements which specify the maximum execution time for a specific action, QA must check the performance of this action on a dedicated machine using precise hardware under precise load, both hardware and load being specified in the requirements.

On the other hand, some erroneous changes to the source code may severely impact the performance. Noticing this negative impact early, before the source code reaches source control and is verified by QA department, could be beneficial in terms of time lost by QA department reporting the issue, and by the developer fixing it several commits later.

To do this, is it a good idea:

To use unit tests to have an idea of the time spent executing the same action² n times,
To use per-test timeout through [TestMethod, Timeout(200)] attribute in C#?

I expect several problems with this approach:

Conceptually, unit tests are not really for that: they are expected to test a small part of a code, nothing more: neither the check of a functional requirement, nor an integration test, nor a performance test.
Does unit test timeout in Visual Studio measure really what is expected to be measured, taking in account that initialization and cleanup are nonexistent for those tests or are too short to affect the results?
Measuring performance this way is ugly. Running a benchmark on any machine¹ independently of the hardware, load, etc. is like doing a benchmark that shows that one database product is always faster than another. On the other hand, I don't expect those unit tests to be a definitive result, nor something which is used by the QA department. Those unit tests will be used just to give a general idea about the expected performance, and essentially to alert the developer that his last modification broke something, severely affecting performance.
Test Driven Development (TDD) is impossible for those tests. How would it fail, in the first place, before starting to implement code?
Too many performance tests will affect the time required to run the tests, so this approach is limited to short actions only.

Taking in account those problems, I still find it interesting to use such unit tests if combined with the real performance metrics by QA department.

Am I wrong? Are there other problems which makes it totally unacceptable to use unit tests for this?

If I'm wrong, what is the correct way to alert the developer that a change in source code severely affected performance, before the source code reaches source control and is verified by QA department?

^{¹ Actually, the unit tests are expected to run only on developer PCs having comparable hardware performance, which reduces the gap between the fastest machines which will never be able to fail the performance test, and the slowest machines which will never succeed at passing it.}

^{² By action, I mean a rather short piece of code which spends a few milliseconds to run.}

Best Answer

We are using this approach as well, i.e. we have tests that measure runtime under some defined load scenario on a given machine. It may be important to point out, that we do not include these in the normal unit tests. Unit tests are basically executed by each developer on a developer machine before commiting the changes. See below for why this doesn't make any sense for performance tests (at least in our case). Instead we run performance tests as part of integration tests.

You correctly pointed out, that this should not rule out verification. We do not assume our test to be a test of the non-functional requirement. Instead, we consider it a mere potential-problem-indicator.

I am not sure about your product, but in our case, if performance is insufficient, it means a lot of work is required to "fix" that. So the turn-around time, when we leave this entirely to QA is horrible. Additionally, the performance fixes will have severe impacts on a large part of the code-base, which renders previous QA work void. All in all, a very inefficient and unsatisfying workflow.

That being said, here are some points to your respective issues:

conceptually: it is true, that this is not what unit tests are about. But as long as everyone's aware, that the test is not supposed to verify anything that QA should do, it's fine.
Visual Studio: can't say anything about that, as we do not use the unit test framework from VS.
Machine: Depends on the product. If your product is something developed for end-users with custom individual desktop machines, then it is in fact more realistic to execute the tests on different developers' machines. In our case, we deliver the product for a machine with a given spec and we execute these performance tests only on such a machine. Indeed, there is not much point in measuring performance on your dual-core developer machine, when the client ultimately will run 16 cores or more.
TDD: While initial failure is typical, it's not a must. In fact, writing these tests early makes it serve more as a regression test rather than a traditional unit test. That the test succeeds early on is no problem. But you do get the advantage, that whenever a developer adds functionality that slows down things, because s/he was not aware of the non-functional performance requirement, this TDD test will spot it. Happens a lot, and it is awesome feedback. Imagine that in your daily work: you write code, you commit it, you go to lunch and when you're back, the build system tells you that this code when executed in a heavy load environment is too slow. That's nice enough for me to accept, that the TDD test is not initially failed.
Run-time: As mentioned, we do not run these tests on developer machines, but rather as part of the build system in a kind of integration test.

Related Solutions

Unit Testing – How to Test Correct Implementation of an Interface

The way you test glue code is by running an integration test.

The purpose of integration testing is to verify functional, performance, and reliability requirements placed on major design items. These "design items", i.e. assemblages (or groups of units), are exercised through their interfaces using black box testing, success and error cases being simulated via appropriate parameter and data inputs.

Simulated usage of shared data areas and inter-process communication is tested and individual subsystems are exercised through their input interface. Test cases are constructed to test whether all the components within assemblages interact correctly, for example across procedure calls or process activations, and this is done after testing individual modules, i.e. unit testing.

PHP Unit Testing – Method Per Test vs Data Provider

It depends.

You should ask yourself: what property of the code is this particular test testing? What do I know when I know that this test is green?

I don't mean this as a philosophical exercise, but in a very practical sense. Take your Adder, for example. You can ask it "Can you add 2 and 2 correctly?" and "Can you add 2 and 3 correctly?" So you would write these tests:

public function knowsTwoPlusTwoIsFour ()
public function knowsTwoPlusThreeIsFive ()

But maybe you don't want to know that. Maybe what you really want to know is "Can you add two ordinay numbers correctly?" (I'll explain what "ordinary" means later) Now your test should use a data provider. The test would be

public function canAddOrdinaryNumbers()

and your data source should include the usual "normal" cases, i.e. adding zero, adding negative numbers, adding its complement:

[4, 2, 2],    // Duh
[-1, 2, -3],  // negative summand
[0, 0, 0],    // just to make sure
[2, 2, 0],    // neutral element
[0, 2, -2],   // inverse

Now your test tells you that Adder has no problems with normal numbers. That's basically one piece of information, hence it should be one test method.

What about numbers that are not ordinary?

public function canAddOneToNaN ()
public function knowsWhatInfPlusNegInfIs ()
public function croaksOnFLoatingPointInput ()
public function whatIsOnePlusSqrtOfMinusOneAnyway ()

I'll leave the implementation of these test cases as an exercise to the reader, and would like to focus on this point: All unit test methods in a class should convey about the same amount of information. That is a subjective measure, of course, but I think it's a good rule: look at the tests and ask yourself: are some of these tests only telling me stuff others already did before? Do some tests seem to have more meaning than others? If so, then you might want to do some refactoring.

Best Answer

Related Solutions

Unit Testing – How to Test Correct Implementation of an Interface

PHP Unit Testing – Method Per Test vs Data Provider

Related Topic