Unit Testing – Is Measuring Method Performance by Timeout Effective?

performanceproject-managementRequirementsunit testing

In a project where there are non-functional requirements which specify the maximum execution time for a specific action, QA must check the performance of this action on a dedicated machine using precise hardware under precise load, both hardware and load being specified in the requirements.

On the other hand, some erroneous changes to the source code may severely impact the performance. Noticing this negative impact early, before the source code reaches source control and is verified by QA department, could be beneficial in terms of time lost by QA department reporting the issue, and by the developer fixing it several commits later.

To do this, is it a good idea:

  • To use unit tests to have an idea of the time spent executing the same action² n times,

  • To use per-test timeout through [TestMethod, Timeout(200)] attribute in C#?

I expect several problems with this approach:

  • Conceptually, unit tests are not really for that: they are expected to test a small part of a code, nothing more: neither the check of a functional requirement, nor an integration test, nor a performance test.

  • Does unit test timeout in Visual Studio measure really what is expected to be measured, taking in account that initialization and cleanup are nonexistent for those tests or are too short to affect the results?

  • Measuring performance this way is ugly. Running a benchmark on any machine¹ independently of the hardware, load, etc. is like doing a benchmark that shows that one database product is always faster than another. On the other hand, I don't expect those unit tests to be a definitive result, nor something which is used by the QA department. Those unit tests will be used just to give a general idea about the expected performance, and essentially to alert the developer that his last modification broke something, severely affecting performance.

  • Test Driven Development (TDD) is impossible for those tests. How would it fail, in the first place, before starting to implement code?

  • Too many performance tests will affect the time required to run the tests, so this approach is limited to short actions only.

Taking in account those problems, I still find it interesting to use such unit tests if combined with the real performance metrics by QA department.

Am I wrong? Are there other problems which makes it totally unacceptable to use unit tests for this?

If I'm wrong, what is the correct way to alert the developer that a change in source code severely affected performance, before the source code reaches source control and is verified by QA department?


¹ Actually, the unit tests are expected to run only on developer PCs having comparable hardware performance, which reduces the gap between the fastest machines which will never be able to fail the performance test, and the slowest machines which will never succeed at passing it.

² By action, I mean a rather short piece of code which spends a few milliseconds to run.

Best Answer

We are using this approach as well, i.e. we have tests that measure runtime under some defined load scenario on a given machine. It may be important to point out, that we do not include these in the normal unit tests. Unit tests are basically executed by each developer on a developer machine before commiting the changes. See below for why this doesn't make any sense for performance tests (at least in our case). Instead we run performance tests as part of integration tests.

You correctly pointed out, that this should not rule out verification. We do not assume our test to be a test of the non-functional requirement. Instead, we consider it a mere potential-problem-indicator.

I am not sure about your product, but in our case, if performance is insufficient, it means a lot of work is required to "fix" that. So the turn-around time, when we leave this entirely to QA is horrible. Additionally, the performance fixes will have severe impacts on a large part of the code-base, which renders previous QA work void. All in all, a very inefficient and unsatisfying workflow.

That being said, here are some points to your respective issues:

  • conceptually: it is true, that this is not what unit tests are about. But as long as everyone's aware, that the test is not supposed to verify anything that QA should do, it's fine.

  • Visual Studio: can't say anything about that, as we do not use the unit test framework from VS.

  • Machine: Depends on the product. If your product is something developed for end-users with custom individual desktop machines, then it is in fact more realistic to execute the tests on different developers' machines. In our case, we deliver the product for a machine with a given spec and we execute these performance tests only on such a machine. Indeed, there is not much point in measuring performance on your dual-core developer machine, when the client ultimately will run 16 cores or more.

  • TDD: While initial failure is typical, it's not a must. In fact, writing these tests early makes it serve more as a regression test rather than a traditional unit test. That the test succeeds early on is no problem. But you do get the advantage, that whenever a developer adds functionality that slows down things, because s/he was not aware of the non-functional performance requirement, this TDD test will spot it. Happens a lot, and it is awesome feedback. Imagine that in your daily work: you write code, you commit it, you go to lunch and when you're back, the build system tells you that this code when executed in a heavy load environment is too slow. That's nice enough for me to accept, that the TDD test is not initially failed.

  • Run-time: As mentioned, we do not run these tests on developer machines, but rather as part of the build system in a kind of integration test.

Related Topic