Code Metrics – Experiments Correlating Code Metrics to Bug Density

code metricscode-quality

I'm wondering if someone has done some experiments correlating code metrics (SLOC, Cyclomatic Complexity, etc) with bug density in Object Oriented applications.

I'm not looking for experiments that only prove or disprove a correlation, but on both. I'm not trying to find a silver bullet as I believe that the bug density of a project might correlate to one or more metrics for a given project or team and the correlation can change during the lifetime of the project / team.

My goal is to

  1. Measure all interesting metrics for 2-3 months (we already have quite a few from sonar).
  2. Find one metric that correlates with the number of new bugs.
  3. Do a root-cause analysis to check why this happens (e.g. Do we lack a certain design skill?).
  4. Improve the skill and measure change for a couple of itereations.
  5. Rinse and repeat from 2.

If you don't have any experience on this, but remember seeing a paper / blog on this subject, I would appreciate if you can share it.


So far I've found the following links with some information about this subject

Best Answer

Whenever I hear of attempts to associate some type of code-based metric with software defects, the first thing that I think of is McCabe's cyclomatic complexity. Various studies have found that there is a correlation between a high cyclomatic complexity and number of defects. However, other studies that looked at modules with a similar size (in terms of lines of code) found that there might not be a correlation.

To me, both number of lines in a module and cyclomatic complexity might serve as good indicators of possible defects, or perhaps a greater likelihood that defects will be injected if modifications are made to a module. A module (especially at the class or method level) with high cyclomatic complexity is harder to understand since there are a large number of independent paths through the code. A module (again, especially at the class or method level) with a large number of lines is also hard to understand since the increase in lines means more things are happening. There are many static analysis tools that support computing both source lines of code against specified rules and cyclomatic complexity, it seems like capturing them would be grabbing the low hanging fruit.

The Halstead complexity measures might also be interesting. Unfortunately, their validity appears to be somewhat debated, so I wouldn't necessary rely on them. One of Halstead's measures is an estimate of defects based on effort or volume (a relationship between program length in terms of total operators and operands and program vocabulary in terms of distinct operators and operators).

There is also a group of metrics known as the CK Metrics. The first definition of this metrics suite appears to be in a paper titled A Metrics Suite for Object Oriented Design by Chidamber and Kemerer. They define Weighted Methods Per Class, Depth of Inheritance Tree, Number of Children, Coupling Between Object Classes, Response for a Class, and Lack of Cohesion in Methods. Their paper provides the computational methods as well as a description of how to analyze each one.

In terms of academic literature that analyze these metrics, you might be interested in Empirical Analysis of CK Metrics for Object-Oriented Design Complexity: Implications for Software Defects, authored by Ramanath Subramanyam and M.S. Krishna. They analyzed three of the six CK metrics (weighted methods per class, coupling between object classed, and depth of inheritance tree). Glancing through the paper, it appears that they found these are potentially valid metrics, but must be interpreted with caution as "improving" one could lead to other changes that also lead to a greater probability of defects.

Empirical Analysis of Object-Oriented Design Metrics for Predicting High and Low Severity Faults, authored by Yuming Zhou and Hareton Leung, also examine the CK metrics. Their approach was to determine if they can predict defects based on these metrics. They found that many of the CK metrics, except for depth of inheritance tree and number of children) had some level of statistical significance in predicting areas where defects could be located.

If you have an IEEE membership, I would recommend searching in the IEEE Transactions on Software Engineering for more academic publications and IEEE Software for some more real-world and applied reports. The ACM might also have relevant publications in their digital library.

Related Topic