How should compilers report errors and warnings

compilererrorsprogramming-languages

I don't plan on writing a compiler in the near future; still, I'm quite interested with compiler technologies, and how this stuff could be made better.

Starting with compiled languages, most compilers have two error levels: warnings and errors, the first being most of the time non-fatal stuff you should fix, and errors indicating most of the time that it's impossible to produce machine- (or byte-) code from the input.

Though, this is a pretty weak definition. In some languages like Java, certain warnings are simply impossible to get rid of without using the @SuppressWarning directive. Also, Java treats certain non-fatal problems as errors (for instance, unreachable code in Java triggers an error for a reason I'd like to know).

C# doesn't have the same problems, but it does have a few. It seems that compilation occurs in several passes, and a pass failing will keep the further passes from executing. Because of that, the error count you get when your build fails is often grossly underestimated. On one run it might say you have two errors, but once you fix them maybe you'll get 26 new ones.

Digging to C and C++ simply shows a bad combination on Java and C#'s compilation diagnostic weaknesses (though it might be more accurate to say that Java and C# just went their way with half the problems each). Some warnings really ought to be errors (for instance when not all code paths return a value) and still they're warnings because, I suppose, at the time they wrote the standard, compiler technology wasn't good enough to make these kind of checks mandatory. In the same vein, compilers often check for more than the standard says, but still use the "standard" warning error level for the additional findings. And often, compilers won't report all the errors they could find right away; it might take a few compiles to get rid of all of them. Not to mention the cryptic errors C++ compilers like to spit, where a single mistake can cause tens of error messages.

Now adding that many build systems are configurable to report failures when the compilers emit warnings, we just get a strange mix: not all errors are fatal but some warnings should; not all warnings are deserved but some are explicitly suppressed without further mention of their existence; and sometimes all warnings become errors.

Non-compiled languages still have their share of crappy error reporting. Typos in Python won't be reported until the code is actually run, and you can never really kick of more than one error at a time because the script will stop executing after it meets one.

PHP, on its side, has a bunch of more or less significant error levels, and exceptions. Parse errors are reported one at a time, warnings are often so bad they should abort your script (but don't by default), notices really often show grave logic problems, some errors really aren't bad enough to stop your script but still do, and as usual with PHP, there are some really weird things down there (why the hell do we need an error level for fatal errors that aren't really fatal? E_RECOVERABLE_E_ERROR, I'm talking to you).

It seems to me that every single implementation of compiler error reporting I can think of is broken. Which is a real shame, since how all good programmers insist on how important it is to correctly deal with errors and yet can't get their own tools to do so.

What do you think should be the right way to report compiler errors?

Best Answer

Your question doesn't seem to actually be about how we report compiler errors - rather, it's about the classification of problems and what to do about them.

If we start by assuming, for the moment, that the warning/error dichotomy is correct, let's see how well we can build on top of that. Some ideas:

  1. Different "levels" of warning. A lot of compilers sort-of implement this (for example GCC has lots of switches for configuring exactly what it will warn about), but it needs work - for example, reporting what severity a reported warning is, and the ability to set "warnings are errors" for only warnings above a specified severity.

  2. Sane classification of errors and warnings. An error should only be reported if the code doesn't meet the specification, and hence cannot be compiled. Unreachable statements, while probably a coding error, should be a warning, not an error - the code is still "valid", and there are legitimate instances in which one would want to compile with unreachable code (quick modifications for debugging, for instance).

Now things I disagree with you on:

  1. Making extra effort to report every problem. If there's an error, that breaks the build. The build is broken. The build will not work until that error is fixed. Hence, it's better to report that error immediately, rather than "carrying on" in order to try and identify everything else "wrong" with the code. Especially when a lot of those things are probably caused by the initial error anyway.

  2. Your specific example of a warning-that-should-be-an-error. Yes, it's probably a programmer mistake. No, it shouldn't break the build. If I know the input to the function is such that it will always return a value, I should be able to run the build and do some tests without having to add those extra checks. Yes, it should be a warning. And a damn high-severity one at that. But it shouldn't break the build in and of itself, unless compiling with warnings-are-errors.

Thoughts?