Unit Testing – Understanding Cyclomatic Complexity

cyclomatic-complexityunit testing

I've recently come across Cyclomatic Complexity and I'd like to try to understand it better.

What are some practical coding examples of the different factors that go into calculating the complexity? Specifically, for the Wikipedia equation of M = E − N + 2P, I want to better understand what each of the following terms means:

  • E = the number of edges of the graph
  • N = the number of nodes of the graph
  • P = the number of connected components

I suspect that either E or N may be the number of decision points (if, else if, for, foreach, etc) in a block of code, but I'm not quite sure which is which or what the other signifies. I'm also guessing that P refers to function calls and class instantiations, but there isn't a clear definition given that I can see. If someone could shed a little more light with some clear code examples of each, it would help.

As a follow-up, does the Cyclomatic Complexity directly correlate to the number of unit tests needed for 100% path coverage? As an example, does a method with a complexity of 4 indicate that 4 unit tests are needed to cover that method?

Finally, do regular expressions affect Cyclomatic Complexity, and if so, how?

Best Answer

Regarding the formula: nodes represent states, edges represent state changes. In every program, statements bring changes in the program state. Each consecutive statement is represented by an edge, and the state of the program after (or before...) the execution of the statement is the node.

If you have a branching statement (if for example) - then you have two nodes coming out, because the state can change in two ways.

Another way to calculate the Cyclomatic Complexity Number (CCN) is to calculate how many "regions" in the execution graph you have (where "independent region" is a circle that doesn't contain other circles). In this case the CCN will be the number of independent regions plus 1 (which would be exactly the same number as the previous formula gives you).

The CCN is used for branching coverage, or path coverage, which is the same. The CCN equals to the number of different branching paths theoretically possible in a single threaded application (that may include branches like "if x < 2 and x > 5 then", but that should be caught by a good compiler as an unreachable code). You have to have at least that number of different test cases (can be more since some test cases might be repeating paths covered by previous ones, but not less assuming each case covers a single path). If you cannot cover a path with any possible test case - you found unreachable code (although you'll need to actually prove to yourself why it is unreachable, probably some nested x < 2 and x > 5 lurking somewhere).

As to regular expressions - of course they affect, as any other piece of code. However, the CCN of the regex construct is probably too high to cover in a single unit test, and you can assume that the regex engine has been tested, and ignore the expressions' branching potential for your testing needs (unless you're testing your regex engine, of course).