Programming Languages – Why Prefer Indentation Over Explicit Markers for Blocks?

language-agnosticlanguage-designprogramming-languages

I am learning Haskell, and I was looking for an auto indentation tool. I didn't look much, and learned that in Haskell (as in Python), indentation signifies a block. As a result, I'm guessing that it's impossible to create an auto formatting tool, as strong as in other languages in the C family, that use explicit markers, such as { } (curly braces) or begin end keywords.

I do not mind a language enforcing indentation for readability, but I cannot understand the benefits over both enforcing indentation and having some explicit marker, so that automated tools, can understand what belongs in which block.

If the preference of indentation marking a block, is so that code looks better, then I still don't understand the advantage. Given that tabs and spaces are represented differently in different editors and different fonts (mono-space fonts for example look tidier), it's infeasible to expect the programmer to present the code decently. A tool that can take into account the current text editor, would be much more appropriate to format the code correctly.

Why would a language designer choose indentation over explicit block markers?

Best Answer

Guido Von Rossum

From an interview with Guido Van Rossum, which can be seen in fulltext with books.google.com (emphasis mine):

The choice of indentation for grouping was not a novel concept in Python; I inherited this from ABC, but it also occurred in occam, an older language. I don't know if the ABC authors got the idea from occam, or invented it independently, or if there was a common ancestor. Of course, I could have chose not to follow ABC’s lead, as I did in other areas (e.g., ABC used uppercase for language keywords and procedure names, an idea I did not copy), but I had come to like the feature quite a bit while using ABC, as it seemed to do away with a certain type of pointless debate common amongst C users at the time, about where to place the curly braces.

Von Rossum was heavily inspired from ABC, and even though he did not have to copy all of it, the use of indentation was kept because it could be beneficial in avoiding religious wars.

I also was well aware that readable code uses indentation voluntarily anyway to indicate grouping, and I had come across subtle bugs in code where the indentation disagreed with the syntactic grouping using curly braces—the programmer and any reviewers had assumed that the indentation matched the grouping and therefore not noticed the bug. Again, a long debugging session taught a valuable lesson.

Rossum also witnessed bugs due to inconsistency between grouping and indent, and apparently though that relying on indentation only to structure the code would be safer from programming errors1.

Donald E. Knuth & Peter J. Landin

In the referenced interview, Guido mentions Don Knuth's idea of using indentation. This is detailed in The Knuth Indentation Quote rediscovered, which quotes Structured Programming with goto Statements. Knuth also references Peter John Landin's The next 700 programming languages (see the Discussion section about indentation). Landin designed ISWIM which looks like the first language with indentation instead of begin/end blocks. Those papers are more about the feasibility of using indentation for structuring programs rather that actual arguments in favor of doing so.


1. I think that this is in fact an argument in favor of having both grouping constructs and auto-formatting, in order to catch and recover from programming errors, which are bound to happen. If you screw up your indentation in Python, the person who debugs your code will have to guess which is correct:

if (test(x)):
  foo(x)
  bar(x)

Shall bar always be called or only if the test succeed?

Grouping constructs add a level of redundancy that help you spot a mistake when you auto-indent your code. In C, the equivalent code can be auto-indented as follows:

if (test(x))
  foo(x);
bar(x);

If I intended for bar to be at the same level as foo, then auto-indenting based on the code structure let me see that there is something wrong that can be fixed by adding braces around foo and bar.

In Python: Myths about Indentation, there is a supposedly bad example from C:

/*  Warning:  bogus C code!  */

if (some condition)
        if (another condition)
                do_something(fancy);
else
        this_sucks(badluck);

That's the same case as above, in Emacs, I highlight the whole block/function, press Tab, and then all the code is reindented. The discrepancy between human indentation and code structure tells me something is off (that and the preceding comment!).

Besides, the intermediate code where indentation is off in C simply does not make it through the master branch, all the style checks in place would make GCC/Jenkins scream at me. I recently had a problem similar to the one described above in Python, with a statement off by one level of indentation. Sometimes I have code in C that goes beyond a closing brace, but then I hit Tab and the code indents "wrongly": that's one more chance to see the bug.

Related Topic