How to know if software is good or bad based upon empirical metrics

code-qualitymetrics

I'm currently being asked to look at a project that has finished core development five months ago, but still has a high level of defects. What transpires is for around every 10 defects fixed, we raise at least 4 and in some cases 8 defects.

I believe coding practice at the vendor is poor and there is general agreement around this. However, I am wondering if there is a structural issue with the software? Defect density is a useful measure, but more if the core software is badly written, then all the vendor is doing is shifting the problem about.

In infrastructure it is more defined if something is poorly built, what measurements can you use for software beside defects per LOC?

The product has been in defect fixing phase for 4 months and still has not resolved enough critical defects. We are not injecting new functionality, just fixing regression issues.

This indicates a development quality problem, that is not contented. However, if the product itself is fundamentally flawed, that is a different problem. Concern being is the core code base has been badly written and has limited documentation, all the external developers are doing is shifting the problem from A to B. Once the internal development teams take over I am concerned that they will have to fundamentally rewrite code to get it functional.

So when you accept a product from a third party and are asked to support it, what acceptance criteria would you use to define standards?

Besides getting our lead developer to do peer review of code per release, not sure what else can be done?

Best Answer

You don't.

Software quality is really hard to measure objectively. Hard enough that there isn't a solution. I'm refraining in this answer to dabble on the question whether there can be a solution at all, but simply point out why defining one would be really hard.

Reasoning by status quo

As Kilian Foth pointed out, if there was a simple measure for "good" software, we'd all be using it and everyone would demand it.

There are projects in which managers decided to enforce certain metrics. Sometimes it worked, sometimes it didn't. I am not aware of any significant correlations. Especially critical systems software (think airplanes, cars, etc.) have a lot of metrics requirements to "ensure" SW quality - I am not aware of any studies showing that these requirements actually result in higher quality, and I have personal experiences to the contrary.

Reasoning by counter-intelligence

Also hinted at by Kilian already, and more generally phrased as "every metric can and will be played".

What does it mean to play a metric? It's a fun game for developers: you ensure the metric values look really good, while doing really shitty stuff.

Let's say you measure defects per LOC. How am I going to play that? Easy - just add more code! Make stupid code that results in a no-operation over 100 lines and suddenly you have less defects per LOC. Best of all: you actually decreased the software quality that way.

Tool shortcomings are abused, definitions are stretched to their max, completely new ways are invented.. basically, developers are really smart people and should you have just one developer on your team that has fun playing metrics, then your metrics will be questionable.

This is not to say that metrics are always bad - but the attitude of the team towards these metrics is crucial. In particular, this implies it's not going to work well for any subcontractor/3rd party vendor relationship.

Reasoning by wrong targeting

What you want to measure is software quality. What you do measure is one or more metrics.

There is a gap between what you measure and what you believe it'll tell you. This gap is sort of huge even.

It happens all the time in all sorts of businesses all around us. Ever seen decisions based on KPIs (key performance indicators) ? It's just the same problem - you want a company to do well, but you measure something else.

Reasoning by quantifiability

Metrics can be measured. Which is the only reason we deal with them at all. Software quality, however, extends way beyond these measurable entities and has a lot to it that is very tough to quantify: How readable is the source code? How extensible is your design? How hard is it for new team members to get onboarded? etc. etc.

Judging software quality only by metrics and turning a blind eye to the parts of quality that you can't quantify is certainly not going to work out well.

edit:

Summary

Let me point out that the above is all about objectively judging whether software is good or bad based on metrics. This means, it is not saying anything about whether and when you should apply metrics.

In fact, this is a unidirectional implication: bad metrics imply bad code. Unidirectional means that bad code does not guarantee bad metrics, nor do good metrics guarantee good code. On the other hand, this in itself means that you can apply metrics to judge a piece of software - when you keep this implication in mind.

You measure software A, and the metrics turn out really bad. Then you can be certain that the quality of the code is bad. You measure software B and the metrics are ok, then you have no clue whatsoever about the code quality. Don't be fooled into thinking "metrics good = code good" when it's really just "code good => metrics good".

In essence, you can use metrics to find quality problems, but not quality itself.