Code Quality – Best Practices for Inline Code Comments

ccode-qualitycommentsjavaprogramming practices

We are doing some refactoring to a 20 years old legacy codebase, and I'm having a discussion with my colleague about the comments format in the code (plsql, java).

There is no a default format for comments, but in most cases people do something like this in the comment:

// date (year, year-month, yyyy-mm-dd, dd/mm/yyyy), (author id, author name, author nickname) and comment

the proposed format for future and past comments that I want is:

// {yyyy-mm-dd}, unique_author_company_id, comment

My colleague says that we only need the comment, and must reformat all past and future comments to this format:

// comment

My arguments:

I say for maintenance reasons, it's important to know when and who
did a change (even this information is in the SCM).
The code is living, and for that reason has a history.
Because without the change dates it's impossible to know when a change was introduced without open the SCM tool and search in the long object history.
because the author is very important, a change of authors is more credible than a change of authory
Agility reasons, no need to open and navigate through the SCM tool
people would be more afraid to change something that someone did 15 years ago, than something that was recently created or changed.
etc.

My colleague's arguments:

The history is in the SCM
Developers must not be aware of the history of the code directly in the code
Packages gets 15k lines long and unstructured comments make these packages harder to understand

What do you think is the best approach? Or do you have a better approach to solve this problem?

Best Answer

General Comments

I am a great believer in comments are for why (not how). When you start adding comments about how you fall into the problem that nothing is enforcing that comments be maintained in relation to the code (the why will usually not change (the why explanation may be enhanced some over time)).

In the same way date/authorInfo does not gain you anything in terms of why the code was done this way; just like the how it can degenerate over time because there is no enforcement by any tools. Also the same information is already stored in the source control system (so you are duplicating effort (but in a less reliable way)).

Going through the arguments:

I say for maintenance reasons, it's important to know when and who did a change (even this information is in the SCM).

Why. Neither of these things strike me as important to maintaining the code. If you need to talk to the author it is relatively simple to find this information from source control.

The code has life for that reason had an history.

History is stored in source control.
Also do you trust that the comment was written by that person. How comments tend to degrade over time so this kind of history becomes unreliable. Source control systems on the other hand will maintain a very accurate history and you can accurately see when comments were added/removed.

Because without the change date it's impossible to know when a change was introduced without open the SCM tool and search in the long object history.

If you trust the data in a comment.
One of the problems with this kind of things is that the comments become incorrect in relation to the code. Back to the correct tool for the job. The source control system will do this correctly without need for intervention from the user. If your source control system is a pain then maybe you need to either learn how to use it more appropriately (as that functionality is usually easy) or if does not support it find a better source control system.

because the author is very important, a change of authorx is more credible than a change of authory

All authors (apart from yourself) are equally credible.

Agility reasons, no need to open an navigate the SCM tool

If your source control tool are that burdensome you are wither using it incorrectly or (it is more likely) you are using the wrong set of tools to access the source control system.

people would be afraid of change something that someone did 15 years ago, than someting that was receantly made ...

If code has lasted 15 years then it is more likely to be more solid then code that has only lasted 6 months without needing review. Stable code tends to stay stable, buggy code tends to get more complex over time (as the reason it is buggy is the problem is not as simple as first thought).

Even more reason to use source control to get information.

The history is in the SCM

Yes. Best reason yet.

Developers must not be aware of history of the code directly in the code

If I really need this information I will look it up in source control.
Otherwise it is not relevant.

Packages gets 15k lines long and unstructured comments this packages harder to understand

Comments should be a description of why you are doing something anyway.
Comments should NOT be describing how the code works (unless the algorithm is not obvious).

Case 1 - Tasks

If you use an IDE like Eclipse, Netbeans, Visual Studio (or have some way of doing text searches on your codebase with anything else), maybe your team uses some specific "comment tags" or "task tags". In which case this can be useful.

I would from time to time, when reviewing code, add something like the following:

// TOREVIEW: [2010-12-09 haylem] marking this for review because blablabla

or:

// FIXME: [2010-12-09 haylem] marking this for review because blablabla

I use different custom task tags that I can see in Eclipse in the task view for this, because having something in the commit logs is a good thing but not enough when you have an executive asking you in a review meeting why bugfix XY was completely forgotten and slipped through. So on urgent matters or really questionable pieces of code, this serves as an additional reminder (but usually I'll keep the comment short and check the commit logs because THAT's what the reminder is here for, so I don't clutter the code too much).

Case 2 - 3rd-Party Libs' Patches

If my product needs to package a 3rd party piece of code as source (or library, but re-built from source) because it needed to be patched for some reason, we document the patch in a separate document where we list those "caveats" for future reference, and the source code will usually contain a comment similar to:

// [PATCH_START:product_name]
//  ... real code here ...
// [PATCH_END:product_name]

Case 3 - Non-Obvious Fixes

This one is a bit more controversial and closer to what your senior dev is asking for.

In the product I work on at the moment, we sometimes (definitely not a common thing) have a comment like:

// BUGFIX: [2010-12-09 haylem] fix for BUG_ID-XYZ

We only do this if the bugfix is non-obvious and the code reads abnormally. This can be the case for browser quirks for instance, or obscure CSS fixes that you need to implement only because there's a document bug in a product. So in general we'd link it to our internal issue repository, which will then contain the detailed reasoning behind the bugfix and pointers to the documentation of the external product's bug (say, a security advisory for a well known Internet Explorer 6 defect, or something like that).

But as mentioned, it's quite rare. And thanks to the task tags, we can regularly run through these and check if these weird fixes still make sense or can be phased out (for instance, if we dropped support for the buggy product causing the bug in the first place).

This just in: A real life example

In some cases, it's better than nothing :)

I just came across a huge statistical computation class in my codebase, where the header comment was in the form of a changelog with the usual yadda yadda: reviewer, date, bug ID.

At first I thought of scrapping but I noticed the bug IDs did not only not match the convention of our current issue tracker but neither did they match the one of the tracker used before I joined the company. So I tried to read through the code and get an understanding of what the class was doing (not being a statistician) and also tried to dig up these defect reports. As it happens they were fairly important and would have maed the life of the next guy to edit the file without knowing about them quite horrible, as it dealt with minor precision issues and special cases based on very specific requirements emitted by the originating customer back then. Bottom line, if these had not been in there, I wouldn't have known. If they hadn't been in there AND I had had a better understanding of the class, I would have noticed that some computations were off and broken them by "fixing" them.

Sometimes it's hard to keep track of very old requirements like these. In the end what I did was still remove the header, but after sneaking in a block comment before each incriminating function describing why these "weird" computations as they are specific requests.

So in that case I still considered these a bad practice, but boy was I happy the original dev did at least put them in! Would have been better to comment the code clearly instead, but I guess that was better than nothing.

Industry Standards for Structuring Code and Comments

There are no standards that exist across all industries that govern code structure. I would doubt that any industry has industry-wide coding standards that deal with comments, file structure, and project organization.

The closest thing that you might find would be MISRA C, which are guidelines produced by the Motor Industry Software Reliability Association to ensure safe, portable, and reliable code in code that runs in the embedded systems in cars. However, as I don't work in the automotive industry, I've never had to deal with this.

Typically, code structure and guidelines are done at the organizational or project level, not across organizations or within industries. If you want examples of how to write good, clean, readable code, I suggest looking at Steve McConnell's Code Complete, Andy Hunt and Dave Thomas's The Pragmatic Programmer, Diomidis Spinellis's Code Quality: The Open Source Perspective, and Robert "Uncle Bob" Martin's Clean Code: A Handbook of Agile Software Craftsmanship - these are the canonical books on how to write the best code you can, regardless of language or platform. Many languages also have style guidelines, such as Oracle's Code Conventions for the Java Programming Language, which can serve as a foundation for an organization or project's code style rules.