Documentation, Coding Style – Commenting Regular Expressions Effectively

coding-stylecommentsdocumentation

Are there any common practises for commenting the regular expressions: inline comments referring different part of RegEx or general comment for all expression?

Best Answer

In my view, a good practice is to concisely state in comments what the general idea of the regular expression is. This saves other developers (or sometimes yourself) the hassle of copy-pasting the regex in a parser like RegExr, only to understand what it does.

Case 1 - Tasks

If you use an IDE like Eclipse, Netbeans, Visual Studio (or have some way of doing text searches on your codebase with anything else), maybe your team uses some specific "comment tags" or "task tags". In which case this can be useful.

I would from time to time, when reviewing code, add something like the following:

// TOREVIEW: [2010-12-09 haylem] marking this for review because blablabla

or:

// FIXME: [2010-12-09 haylem] marking this for review because blablabla

I use different custom task tags that I can see in Eclipse in the task view for this, because having something in the commit logs is a good thing but not enough when you have an executive asking you in a review meeting why bugfix XY was completely forgotten and slipped through. So on urgent matters or really questionable pieces of code, this serves as an additional reminder (but usually I'll keep the comment short and check the commit logs because THAT's what the reminder is here for, so I don't clutter the code too much).

Case 2 - 3rd-Party Libs' Patches

If my product needs to package a 3rd party piece of code as source (or library, but re-built from source) because it needed to be patched for some reason, we document the patch in a separate document where we list those "caveats" for future reference, and the source code will usually contain a comment similar to:

// [PATCH_START:product_name]
//  ... real code here ...
// [PATCH_END:product_name]

Case 3 - Non-Obvious Fixes

This one is a bit more controversial and closer to what your senior dev is asking for.

In the product I work on at the moment, we sometimes (definitely not a common thing) have a comment like:

// BUGFIX: [2010-12-09 haylem] fix for BUG_ID-XYZ

We only do this if the bugfix is non-obvious and the code reads abnormally. This can be the case for browser quirks for instance, or obscure CSS fixes that you need to implement only because there's a document bug in a product. So in general we'd link it to our internal issue repository, which will then contain the detailed reasoning behind the bugfix and pointers to the documentation of the external product's bug (say, a security advisory for a well known Internet Explorer 6 defect, or something like that).

But as mentioned, it's quite rare. And thanks to the task tags, we can regularly run through these and check if these weird fixes still make sense or can be phased out (for instance, if we dropped support for the buggy product causing the bug in the first place).

This just in: A real life example

In some cases, it's better than nothing :)

I just came across a huge statistical computation class in my codebase, where the header comment was in the form of a changelog with the usual yadda yadda: reviewer, date, bug ID.

At first I thought of scrapping but I noticed the bug IDs did not only not match the convention of our current issue tracker but neither did they match the one of the tracker used before I joined the company. So I tried to read through the code and get an understanding of what the class was doing (not being a statistician) and also tried to dig up these defect reports. As it happens they were fairly important and would have maed the life of the next guy to edit the file without knowing about them quite horrible, as it dealt with minor precision issues and special cases based on very specific requirements emitted by the originating customer back then. Bottom line, if these had not been in there, I wouldn't have known. If they hadn't been in there AND I had had a better understanding of the class, I would have noticed that some computations were off and broken them by "fixing" them.

Sometimes it's hard to keep track of very old requirements like these. In the end what I did was still remove the header, but after sneaking in a block comment before each incriminating function describing why these "weird" computations as they are specific requests.

So in that case I still considered these a bad practice, but boy was I happy the original dev did at least put them in! Would have been better to comment the code clearly instead, but I guess that was better than nothing.

Coding Style – Recommendations for Commenting Code Effectively

Some of the statements below are quite personal, though with some justification, and are meant to be this way.

Comment Types

For the brief version... I use comments for:

trailing comments explaining fields in data structures (apart from those, I don't really use single line comments)
exceptional or purpose-oriented multi-line comments above blocks
public user and/or developer documentation generated from source

Read below for the details and (possibly obscure) reasons.

Trailing Comments

Depending on the language, either using single-line comments or multi-line comments. Why does it depend? It's just a standardization issue. When I write C code, I favor old-fashioned ANSI C89 code by default, so I prefer to always have /* comments */.

Therefore I would have this in C most of the time, and sometimes (depends on the style of the codebase) for languages with a C-like syntax:

typedef struct STRUCT_NAME {
    int fieldA;                /* aligned trailing comment */
    int fieldBWithLongerName;  /* aligned trailing comment */
} TYPE_NAME;

Emacs is nice and does that for me with M-;.

If the language supports single-line comments and is not C-based, I will be more enclined to use the single-line comments. Otherwise, I'm afraid I've now taken the habit. Which isn't necessarily bad, as it forces me to be concise.

Multi-Line Comments

I disagree with your precept using single-line comments for this is more visually appealing. I use this:

/*
 * this is a multi-line comment, which needs to be used
 * for explanations, and preferably be OUTSIDE the a
 * function's or class' and provide information to developers
 * that would not belong to a generated API documentation.
 */

Or this (but I don't that often any more, except on a personal codebase or mostly for copyright notices - this is historical for me and comes from my educational background. Unfortunately, most IDEs screw it up when using auto-format):

/*
** this is another multi-line comment, which needs to be used
** for explanations, and preferably be OUTSIDE the a
** function's or class' and provide information to developers
** that would not belong to a generated API documentation.
*/

If need really be, then I would comment inline using what I mentioned earlier for trailing comments, if it makes sense to use it in a trailing position. On a very special return case, for instance, or to document a switch's case statements (rare, I don't use switch often), or when I document branches in an if ... else control flow. If that's not one of these, usually a comment block outside of the scope outlining the steps of the function/method/block makes more sense to me.

I use these very exceptionally, except if coding in a language without support for documentation comments (see below); in which case they become more prevalent. But in the general case, it really is just for documenting things that are meant for other developers and are internal comments that really need to really stand out. For instance, to document a mandatory empty block like a "forced" catch block:

try {
    /* you'd have real code here, not this comment */
} catch (AwaitedException e) {
    /*
     * Nothing to do here. We default to a previously set value.
     */
}

Which is already ugly for me but I would tolerate in some circumstances.

Documentation Comments

Javadoc & al.

I'd usually use them on methods and classes to document versions introducing a feature (or changing it) especially if that's for a public API, and to provide some examples (with clear input and output cases, and special cases). Though in some cases a unit case might be better to document these, unit tests are not necessarily human readable (no matter what DSL-thingy you use).

They bug me a bit to document fields/properties, as I prefer trailing comments for this and not all documentation generation framework support trailing documentation comments. Doxygen does, for instance, but JavaDoc doesn't, which means you need a top comment for all your fields. I can survive that though, as Java lines are relatively long anyways most of the time, so a trailing comment would creep me out equally by extending the line beyond my tolerance threshold. If Javadoc would ever consider improving that, I'd be a lot happier though.

Commented-Out Code

I use single-line for one reason only, in C-like languages (except if compiling for strict C, where I really don't use them): to comment-out stuff while coding. Most IDEs will have toggle for single-line comments (aligned on indent, or on column 0), and that fits that use case for me. Using the toggle for multi-line comments (or selecting in middle of lines, for some IDEs) will make it harder to switch between comment/uncomment easily.

But as I'm against commented-out code in the SCM, that's usually very short lived because I'll delete commented-out chunks before committing. (Read my answer to this question on "edited-by in line comments and SCMs")

Comment Styles

I usually tend to write:

complete sentences with correct grammar (including punctuation) for documentation comments, as they are supposed to be read later on in an API doc or even as part of a generated manual.
well-formatted but more lax on punctuation/caps for multi-lines comment blocks
trailing blocks without punctuation (because of space and usually because the comment is a brief one, that reads more like a parenthesised statement)

A note on Literate Programming

You might want to get interested in Literate Programming, as introduced in this paper by Donald Knuth.

The literate programming paradigm, [...] represents a move away from writing programs in the manner and order imposed by the computer, and instead enables programmers to develop programs in the order demanded by the logic and flow of their thoughts.2 Literate programs are written as an uninterrupted exposition of logic in an ordinary human language, much like the text of an essay[...].

Literate programming tools are used to obtain two representations from a literate source file: one suitable for further compilation or execution by a computer, the "tangled" code, and another for viewing as formatted documentation, which is said to be "woven" from the literate source.

As a side note and example: The underscore.js JavaScript framework, notwithstanding non-compliance with my commenting style, is a pretty good example of a well-document codebase and a well-formed annotated source - though maybe not the best to use as an API reference).

These are personal conventions. Yes, I might be weird (and you might be too). It's OK, as long as you follow and comply to your team's code conventions when working with peers, or do not radically attack their preferences and cohabitate nicely. It's part of your style, and you should find the fine line between developing a coding style that defines you as a coder (or as a follower of a school of thought or organization with which you have a connection) and respecting a group's convention for consistency.