Programming Languages – Why Most Languages Do Not Nest Block Comments

commentslanguage-agnosticsyntax

A few do, but not any of the popular ones as far as I know. Is there something bad about nesting comments?

I plan to have block comments nest in the (small) language I'm working on, but I would like to know if this is a bad idea.

Best Answer

One thing nobody's mentioned yet, so I'll mention it: The desire to nest comments often indicates that the programmer is Doing It Wrong.

First, let's agree that the only time "nesting" or "not nesting" is visible to the programmer is when the programmer writes something structurally like this:

do_something();
/* comment /* nested comment */ more comment */
do_something_else();

Now, when does such a thing come up in practice? Certainly the programmer isn't going to be writing nested comments that literally look like the above snippet! No, in practice when we nest comments (or wish we could nest them), it's because we want to write something like this:

do_something();  /* do a thing */
/* [ajo] 2017-12-03 this turned out to be unnecessary
do_something_else(); /* do another thing */
*/

And this is BAD. This is not a pattern we (as language designers) want to encourage! The correct way to write the above snippet is:

do_something();  /* do a thing */

That "wrong" code, that false start or whatever it was, doesn't belong in the codebase. It belongs, at best, in the source-control history. Ideally, you'd never even write the wrong code to begin with, right? And if the wrong code was serving a purpose there, by warning maintainers not to reinstate it for some reason, well, that's probably a job for a well-written and intentional code comment. Trying to express "don't do X" by just leaving in some old code that does X, but commented out, is not the most readable or effective way to keep people from doing X.

This all boils down to a simple rule of thumb that you may have heard before: Don't comment out code. (Searching for this phrase will turn up a lot of opinions in agreement.)

Before you ask: yes, languages such as C, C#, and C++ already give the programmer another tool to "comment out" large blocks of code: #if 0. But this is just a particular application of the C preprocessor, which is a large and useful tool in its own right. It would actually be extremely difficult and special-casey for a language to support conditional compilation with #if and yet not support #if 0.

So, we've established that nested comments are relevant only when the programmer is commenting out code; and we've established (via consensus of a lot of experienced programmers) that commenting out code is a Bad Thing.

To complete the syllogism, we must accept that language designers have an interest in promoting Good Things and discouraging Bad Things (assuming that all else is equal).

In the case of nested comments, all else is equal — you can safely ignore the low-voted answers that claim that parsing nested /* would somehow be "difficult" for the parser. (Nested /* are no harder than nested (, which just about every parser in the world already needs to handle.)

So, all else being equal, should a language designer make it easy to nest comments (i.e., to comment out code), or difficult? Recall that commenting out code is a Bad Thing.

Q.E.D.

Footnote. Notice that if you don't allow nested comments, then

hello /* foo*/bar.txt */ world

is a misleading "comment" — it's equivalent to

hello bar.txt */ world

(which is likely a syntax error). But if you do allow nested comments, then

hello /* foo/*.txt */ world

is a misleading "comment" — it's equivalent to

hello

but leaves the comment open all the way to the end of the file (which again is almost certainly a syntax error). So neither way is particularly less prone to unintentional syntax errors. The only difference is in how they handle the intentional antipattern of commented-out code.

Related Solutions

Why Programming Languages Use Curly Braces Instead of Square Braces

Two of the major influences to C were the Algol family of languages (Algol 60 and Algol 68) and BCPL (from which C takes its name).

BCPL was the first curly bracket programming language, and the curly brackets survived the syntactical changes and have become a common means of denoting program source code statements. In practice, on limited keyboards of the day, source programs often used the sequences $( and $) in place of the symbols { and }. The single-line '//' comments of BCPL, which were not taken up in C, reappeared in C++, and later in C99.

From http://www.princeton.edu/~achaney/tmve/wiki100k/docs/BCPL.html

BCPL introduced and implemented several innovations which became quite common elements in the design of later languages. Thus, it was the first curly bracket programming language (one using { } as block delimiters), and it was the first language to use // to mark inline comments.

From http://progopedia.com/language/bcpl/

Within BCPL, one often sees curly braces, but not always. This was a limitation of the keyboards at the time. The characters $( and $) were lexicographically equivalent to { and }. Digraphs and trigraphs were maintained in C (though a different set for curly brace replacement - ??< and ??>).

The use of curly braces was further refined in B (which preceded C).

From Users' Reference to B by Ken Thompson:

/* The following function will print a non-negative number, n, to
  the base b, where 2<=b<=10,  This routine uses the fact that
  in the ASCII character set, the digits 0 to 9 have sequential
  code values.  */

printn(n,b) {
        extern putchar;
        auto a;

        if(a=n/b) /* assignment, not test for equality */
                printn(a, b); /* recursive */
        putchar(n%b + '0');
}

There are indications that curly braces were used as short hand for begin and end within Algol.

I remember that you also included them in the 256-character card code that you published in CACM, because I found it interesting that you proposed that they could be used in place of the Algol 'begin' and 'end' keywords, which is exactly how they were later used in the C language.

From http://www.bobbemer.com/BRACES.HTM

The use of square brackets (as a suggested replacement in the question) goes back even further. As mentioned, the Algol family influenced C. Within Algol 60 and 68 (C was written in 1972 and BCPL in 1966), the square bracket was used to designate an index into an array or matrix.

BEGIN
  FILE F(KIND=REMOTE);
  EBCDIC ARRAY E[0:11];
  REPLACE E BY "HELLO WORLD!";
  WRITE(F, *, E);
END.

As programmers were already familiar with square brackets for arrays in Algol and BCPL, and curly braces for blocks in BCPL, there was little need or desire to change this when making another language.

The updated question includes an addendum of productivity for curly brace usage and mentions python. There are some other resources that do this study though the answer boils down to "Its anecdotal, and what you are used to is what you are most productive with." Because of the widely varying skills in programming and familiarity with different languages, these become difficult to account for.

Much of the gains would be dependent on the IDE (or lack of) that is used. In vi based editors, putting the cursor over one matching open/close and pressing % will then move the cursor to the other matching character. This is very efficient with C based languages back in the old days - less so now.

A better comparison would be between {} and begin/end which was the options of the day (horizontal space was precious). Many Wirth languages were based on a begin and end style (Algol (mentioned above), pascal (many are familiar with), and the Modula family).

I have difficulty finding any that isolate this specific language feature - at best I can do is show that the curly brace languages are much more popular than begin end languages and it is a common construct. As mentioned in Bob Bemer link above, the curly brace was used to make it easier to program as shorthand.

From Why Pascal is Not My Favorite Programming Language

C and Ratfor programmers find 'begin' and 'end' bulky compared to { and }.

Which is about all that can be said - its familiarity and preference.

Javascript – Why don’t languages use the words “and” and “or” instead of “&&” and “||”

In short

It's historical reasons.

The long history

Many older languages created between the 50's and the end of the 60's, as well as used the logical operators that you like such as not or and and:

Fortran II, 1961, introduced logical operators between dots with .NOT. .AND. .OR.
BASIC, 1964, (although I'm not sure that it had these operators in the very first version)
Simula 67, used them as keywords for the more concise and mathematically inspired ¬ for not, ∧ (and=intersection) and ∨ (or=union).
Algol68,1968, used them as a portable alternative to ¬, ∧ and ∨
Pascal, 1970,

Their modern descendants (e.g. ADA) have kept this keyword style.

You can however see that already in this first list, there was a quest for concise expressions in many languages. But the character sets in those years were not portable and the later work on the ASCII character set didn't let many of the special characters survive.

Other languages used also the concise approach but chose characters that were more lucky in the standardisation process (for example see here the rationale that lead to the inclusion of | in the ASCII character set):

PL/I, 1964, used & for and, | for or and ¬ for not
BCPL, 1967 used & for and, | for or and ~ for not. It also offered keyword alternatives. But those were not so appealing: LOGOR, LOGAND and LOGNOT
Finally came C, 1972, that had the incredible growth that we know. C was inspired (indirectly via B) from BCPL. It is not surpriseing that its authors, Kernighan & Ritchie, took over the & and |. But as C is system oriented these were taken as bitwise operators. K&R identified also the need to have short circuit operators for conditional expressions to know that they can skip the rest of the expression if it's already known that it's true or false (the purpose was to write concise error checking conditions). And for these logical operators, they just doubled the symbol, so && and ||

Then came C++ inspired by C, then Java inspired by C++ then JavaScript inspired from Java... and this is why nowadays so many languages have opted for the well known || and &&

P.S.: Note, that if JavaScript would have adopted and or rather than && || , it would probably have adopted begin .. end rather than { .. } , making it overall a lot more verbose than we are used to ;-)

P.S.2: Note, that psshill points out in comment that C++ funilly supports and, or, bitand, bitor and a couple of other alternative tokens. But nobody uses them. Interestingly, these are not a recent language features: Stroustrup explains in his book "The design and evolution of C++" that these keywords were introduced by the C++ ISO committee in November 1993, because in that pre-unicode world, ISO-646 used the ascii code of []{} and | to map European characters, which made C++ very complex on terminals using this encoding. Strangely, though, there is no begin ... end to replace {...} and instead <% and %>. I guess that the alternate keywords not really won traction, because around the same period ISO-8859 encoding started to be used with all ascii characters available. Usage and habits did certainly do the rest: Stroustrup reports highly controversial discussions around alternative tokens.