For context, I'm a Clang developer working at Google. At Google, we've rolled
Clang's diagnostics out to (essentially) all of our C++ developers, and we
treat Clang's warnings as errors as well. As both a Clang developer and one of the larger users of Clang's diagnostics I'll try to shed some light on these flags and how they can be used. Note that everything I'm describing is generically applicable to Clang, and not specific to C, C++, or Objective-C.
TL;DR Version: Please use -Wall
and -Werror
at a minimum on any new code
you are developing. We (the compiler developers) add warnings here for good
reasons: they find bugs. If you find a warning that catches bugs for you, turn
it on as well. Try -Wextra
for a bunch of good candidates here. If one of
them is too noisy for you to use profitably, file a bug. If you write code that
contains an "obvious" bug but the compiler didn't warn about it, file a bug.
Now for the long version. First some background on warning flag groupings.
There are a lot of "groupings" of warnings in Clang (and to a limited extent in
GCC). Some that are relevant to this discussion:
- On-by-default: These warnings are always on unless you explicitly disable them.
-Wall
: These are warnings that the developers have high confidence in
both their value and a low false-positive rate.
-Wextra
: These are warnings that are believed to be valuable and sound
(i.e., they aren't buggy), but they may have high false-positive rates or
common philosophical objections.
-Weverything
: This is an insane group that literally enables every
warning in Clang. Don't use this on your code. It is intended strictly for
Clang developers or for exploring what warnings exist.
There are two primary criteria mentioned above which guide where warnings go in
Clang, and let's clarify what these really mean. The first is the potential
value of a particular occurrence of the warning. This is the expected
benefit to the user (developer) when the warning fires and correctly
identifies an issue with the code.
The second criteria is the idea of false-positive reports. These are
situations where the warning fires on code, but the potential problem being
cited does not in fact occur due to the context or some other constraint of the
program. The code warned about is actually behaving correctly. These are
especially bad when the warning was never intended to fire on that code
pattern. Instead, it is a deficiency in the warning's implementation that causes
it to fire there.
For Clang warnings, the value is required to be in terms of correctness,
not in terms of style, taste, or coding conventions. This limits the set of
warnings available, precluding oft-requested warnings such as warning
whenever {}
s are not used around the body of an if
statement. Clang is also
very intolerant of false-positives. Unlike most other compilers it will use
an incredible variety of information sources to prune false positives including
the exact spelling of the construct, presence or absence of extra '()', casts,
or even preprocessor macros!
Now let's take some real-world example warnings from Clang, and look at how
they are categorized. First, a default-on warning:
% nl x.cc
1 class C { const int x; };
% clang -fsyntax-only x.cc
x.cc:1:7: warning: class 'C' does not declare any constructor to initialize its non-modifiable members
class C { const int x; };
^
x.cc:1:21: note: const member 'x' will never be initialized
class C { const int x; };
^
1 warning generated.
Here no flag was required to get this warning. The rationale is that this is
code is never really correct, giving the warning high value, and the
warning only fires on code that Clang can prove falls into this bucket, giving
it a zero false-positive rate.
% nl x2.cc
1 int f(int x_) {
2 int x = x;
3 return x;
4 }
% clang -fsyntax-only -Wall x2.cc
x2.cc:2:11: warning: variable 'x' is uninitialized when used within its own initialization [-Wuninitialized]
int x = x;
~ ^
1 warning generated.
Clang requires the -Wall
flag for this warning. The reason is that there is
a non-trivial amount of code out there which has used (for good or ill) the
code pattern we are warning about to intentionally produce an uninitialized
value. Philosophically, I see no point in this, but many others disagree and
the reality of this difference in opinion is what drives the warning under the
-Wall
flag. It still has very high value and a very low
false-positive rate, but on some codebases it is a non-starter.
% nl x3.cc
1 void g(int x);
2 void f(int arr[], unsigned int size) {
3 for (int i = 0; i < size; ++i)
4 g(arr[i]);
5 }
% clang -fsyntax-only -Wextra x3.cc
x3.cc:3:21: warning: comparison of integers of different signs: 'int' and 'unsigned int' [-Wsign-compare]
for (int i = 0; i < size; ++i)
~ ^ ~~~~
1 warning generated.
This warning requires the -Wextra
flag. The reason is that there are very
large codebases where mis-matched sign on comparisons is extremely common.
While this warning does find some bugs, the probability of the code being a bug
when the user writes it is fairly low on average. The result is an extremely
high false-positive rate. However, when there is a bug in a program due to
the strange promotion rules, it is often extremely subtle making this warning
when it flags a bug have relatively high value. As a consequence, Clang
provides it and exposes it under a flag.
Typically, warnings don't live long outside of the -Wextra
flag. Clang tries
very hard to not implement warnings which do not see regular use and testing.
The additional warnings turned on by -Weverything
are usually warnings under
active development or with active bugs. Either they will be fixed and placed
under appropriate flags, or they should be removed.
Now that we have an understanding of how these things work with Clang, let's
try to get back to the original question: what warnings should you turn on for
your development? The answer is, unfortunately, that it depends. Consider the
following questions to help determine what warnings work best for your
situation.
- Do you have control over all of your code, or is some of it external?
- What are your goals? Catching bugs, or writing better code?
- What is your false-positive tolerance? Are you willing to write extra code
to silence warnings on a regular basis?
First and foremost, if you don't control the code, don't try turning extra
warnings on there. Be prepared to turn some off. There is a lot of bad code in
the world, and you may not be able to fix all of it. That is OK. Work to find a
way to focus your efforts on the code you control.
Next, figure out what you want out of your warnings. This is different for
different people. Clang will try to warn without any options on egregious bugs,
or code patterns for which we have long historical precedent indicating the bug
rate is extremely high. By enabling -Wall
you're going to get a much more
aggressive set of warnings targeted at catching the most common mistakes that
Clang developers have observed in C++ code. But with both of these the
false-positive rate should remain quite low.
Finally, if you're perfectly willing to silence false-positives at every
turn, go for -Wextra
. File bugs if you notice warnings which are catching
a lot of real bugs, but which have silly or pointless false positives. We're
constantly working to find ways to bring more and more of the bug-finding logic
present in -Wextra
into -Wall
where we can avoid the false-positives.
Many will find that none of these options is just-right for them. At Google,
we've turned some warnings in -Wall
off due to a lot of existing code that
violated the warning. We've also turned some warnings on explicitly, even
though they aren't enabled by -Wall
, because they have a particularly high
value to us. Your mileage will vary, but will likely vary in similar ways. It
can often be much better to enable a few key warnings rather than all of
-Wextra
.
I would encourage everyone to turn on -Wall
for any non-legacy code. For
new code, the warnings here are almost always valuable, and really make the
experience of developing code better. Conversely, I would encourage everyone to
not enable flags beyond -Wextra
. If you find a Clang warning that -Wextra
doesn't include but which proves at all valuable to you, simply file a bug
and we can likely put it under -Wextra
. Whether you explicitly enable some
subset of the warnings in -Wextra
will depend heavily on your code, your
coding style, and whether maintaining that list is easier than fixing
everything uncovered by -Wextra
.
Of the OP's list of warnings (which included both -Wall
and -Wextra
) only
the following warnings are not covered by those two groups (or turned on by
default). The first group emphasize why over-reliance on explicit warning flags
can be bad: none of these are even implemented in Clang! They're accepted on
the command line only for GCC compatibility.
-Wbad-function-cast
-Wdeclaration-after-statement
-Wmissing-format-attribute
-Wmissing-noreturn
-Wnested-externs
-Wnewline-eof
-Wold-style-definition
-Wredundant-decls
-Wsequence-point
-Wstrict-prototypes
-Wswitch-default
The next bucket of unnecessary warnings in the original list are ones which are
redundant with others in that list:
-Wformat-nonliteral
-- Subset of -Wformat=2
-Wshorten-64-to-32
-- Subset of -Wconversion
-Wsign-conversion
-- Subset of -Wconversion
There are also a selection of warnings which are more categorically different.
These deal with language dialect variants rather than with buggy or non-buggy
code. With the exception of -Wwrite-strings
, these all are warnings for
language extensions provided by Clang. Whether Clang warns about their use
depends on the prevalence of the extension. Clang aims for GCC compatibility,
and so in many cases it eases that with implicit language extensions that are
in wide use. -Wwrite-strings
, as commented on the OP, is a compatibility flag
from GCC that actually changes the program semantics. I deeply regret this
flag, but we have to support it due to the legacy it has now.
-Wfour-char-constants
-Wpointer-arith
-Wwrite-strings
The remaining options which are actually enabling potentially interesting
warnings are these:
-Wcast-align
-Wconversion
-Wfloat-equal
-Wformat=2
-Wimplicit-atomic-properties
-Wmissing-declarations
-Wmissing-prototypes
-Woverlength-strings
-Wshadow
-Wstrict-selector-match
-Wundeclared-selector
-Wunreachable-code
The reason that these aren't in -Wall
or -Wextra
isn't always clear. For
many of these, they are actually based on GCC warnings (-Wconversion
,
-Wshadow
, etc.) and as such Clang tries to mimic GCC's behavior. We're
slowly breaking some of these down into more fine-grain and useful warnings.
Those then have a higher probability of making it into one of the top-level
warning groups. That said, to pick on one warning, -Wconversion
is so broad
that it will likely remain its own "top level" category for the foreseeable
future. Some other warnings which GCC has but which have low value and high
false-positive rates may be relegated to a similar no-man's-land.
Other reasons why these aren't in one of the larger buckets include simple
bugs, very significant false-positive problems, and in-development warnings.
I'm going to look into filing bugs for the ones I can identify. They should all
eventually migrate into a proper large bucket flag or be removed from Clang.
I hope this clarifies the warning situation with Clang and provides some
insight for those trying to pick a set of warnings for their use, or their
company's use.
One thing the Visitor Pattern does that is often not talked about, is enabling to choose which side of the Expression Problem you want to tackle.
So, what is the Expression Problem? It refers to the basic problem of extensibility: our programs manipulate data types using operations. As our programs evolve, we need to extend them with new data types and new operations. And particularly, we want to be able to add new operations which work with the existing data types, and we want to add new data types which work with the existing operations. And we want this to be true extension, i.e. we don't want to modify the existing program, we want to respect the existing abstractions, we want our extensions to be separate modules, in separate namespaces, separately compiled, separately deployed, separately type checked. We want them to be type-safe.
The Expression Problem is, how do you actually provide such extensibility in a language?
It turns out that for typical naive implementations of procedural and/or functional programming, it is very easy to add new operations (procedures, functions), but very hard to add new data types, since basically the operations work with the data types using some sort of case discrimination (switch
, case
, pattern matching) and you need to add new cases to them, i.e. modify existing code:
func print(node):
case node of:
AddOperator => print(node.left) + '+' + print(node.right)
NotOperator => '!' + print(node)
func eval(node):
case node of:
AddOperator => eval(node.left) + eval(node.right)
NotOperator => !eval(node)
Now, if you want to add a new operation, say, type-checking, that's easy, but if you want to add a new node type, you have to modify all the existing pattern matching expressions in all operations.
And for typical naive OO, you have the exact opposite problem: it is easy to add new data types which work with the existing operations (either by inheriting or overriding them), but it is hard to add new operations, since that basically means modifying existing classes/objects.
class AddOperator(left: Node, right: Node) < Node:
meth print:
left.print + '+' + right.print
meth eval
left.eval + right.eval
class NotOperator(expr: Node) < Node:
meth print:
'!' + expr.print
meth eval
!expr.eval
Here, adding a new node type is easy, because you either inherit, override or implement all required operations, but adding a new operation is hard, because you need to add it either to all leaf classes or to a base class, thus modifying existing code.
Several languages have several constructs for solving the Expression Problem: Haskell has typeclasses, Scala has implicit arguments, Racket has Units, Go has Interfaces, CLOS and Clojure have Multimethods.
However, in an OO language that doesn't have a way of solving the Expression Problem (such as Java or C#), the Visitor Pattern at least allows you to "pick your poison". What the pattern does, is turn your design 90° to the side: the operations become classes (PrintVisitor
, EvalVisitor
) and conversely, the types become methods (visitAddOperator
, visitNotOperator
(or just visit
, if your language supports argument-based overloading)). This does not solve the Expression Problem (i.e. how to make it easy to add both types and operations), but it does allow you to choose which one to make easy.
So, if your language does support a way to solve the Expression Problem, then you don't need this workaround.
Note, however, this is not the only thing the Visitor Pattern does.
Note: you will note the conspicuous absence of any mention of C++, whatsoever. Unfortunately, I simply don't know enough about it. I suspect that between its overloading and argument-based dispatch, virtual inheritance, free functions, macros, and most importantly compile-time template metaprogramming, the Expression Problem is solved in C++, but I don't know for sure.
The problem is that once someone finds a solution for the Expression Problem, they redefine it to make it even harder so solve, so that new solutions are even more powerful and expressive. For example, the original formulation by the Haskell community did not require modular typechecking, but the Scala community proposed that the Expression Problem should not only include modular extension (separate compilation etc.) of types and operations, but also modular typechecking and type inference of those extensions, which at the moment is something only Scala's implicits can do and Haskell's typeclasses and ML's functors can't.
Best Answer
Yes, Virginia, there is a Santa Claus.
The notion of using programs to modify programs has been around a long time. The original idea came from John von Neumann in the form of stored-program computers. But machine code modifying machine code in arbitrary ways is pretty inconvenient.
People generally want to modify source code. This is mostly realized in the form of program transformation systems (PTS).
PTS generally offer, for at least one programming language, the ability to parse to ASTs, manipulate that AST, and regenerate valid source text. If in fact you dig around, for most mainstream languages, somebody has built such a tool (Clang is an example for C++, the Java compiler offers this capability as an API, Microsoft offers Rosyln, Eclipse's JDT, ...) with a procedural API that is actually pretty useful. For the broader community, almost every language-specific community can point to something like this, implemented with various levels of maturity (usually modest, many "just parsers producing ASTs"). Happy metaprogramming.
[There's a reflection-oriented community that tries to do metaprogramming from inside the programming language, but only achieve "runtime" behaviour modifiation, and only to the extent that the language compilers made some information available by reflection. With the exception of LISP, there are always details about the program that are not available by reflection ("Luke, you need the source") that always limit what reflection can do.]
The more interesting PTS do this for arbitrary languages (you give the tool a language description as a configuration parameter, including at a minimum the BNF). Such PTS also allow you to do "source to source" transformation, e.g., specify patterns directly using the surface syntax of the targeted language; using such patterns, you can code fragments of interest, and/or find and replace code fragments. This is far more convenient than the programming API, because you don't have to know every microscopic details about the ASTs to do most of your work. Think of this as meta-metaprogramming :-}
A downside: unless the PTS offers various kinds of useful static analyses (symbol tables, control and data flow analyses), it is hard to write really interesting transformations this way, because you need to check types and verify information flows for most practical tasks. Unfortunately, this capability is in fact rare in the general PTS. (It is always unavailable with the ever-proposed "If I just had a parser... " See my bio for a longer discussion of "Life After Parsing").
There's a theorem that says if you can do string rewriting [thus tree rewriting] you can do arbitrary transformation; and thus a number of PTS lean on this to claim you can metaprogram anything with just the tree rewrites they offer. While the theorem is satisfying in the sense you are now sure you can do anything, it is unsatisfying in the same way that a Turing Machine's ability to do anything doesn't make programming a Turing Machine the method of choice. (The same holds true for systems with just procedural APIs, if they will let you make arbitrary changes to the AST [and in fact I think this is not true of Clang]).
What you want is the best of both worlds, a system that offers you the generality of the language-parameterized type of PTS (even handling multiple languages), with the additional static analyses, the ability to mix source-to-source transformations with procedural APIs. I only know of two that do this:
Unless you want the write the language descriptions and static analyzers yourself (for C++ this is a tremendous amount of work, which is why Clang was constructed both as a compiler and as general procedural metaprogramming foundation), you will want a PTS with mature language descriptions already available. Otherwise you will spend all your time configuring the PTS, and none doing the work you actually wanted to do. [If you pick a random, non-mainstream language, this step is very hard to avoid].
Rascal tries to do this by co-opting "OPP" (Other People's Parsers) but that doesnt help with the static analysis part. I think they have Java pretty well in hand, but I'm very sure they don't do C or C++. But, its a academic research tool; hard to blame them.
I emphasize, our [commercial] DMS tool does have Java, C, C++ full front ends available. For C++, it covers almost everything in C++14 for GCC and even Microsoft's variations (and we are polishing now), macro expansion and conditional management, and method-level control and data flow analysis. And yes, you can specify grammar changes in a practical way; we built a custom VectorC++ system for a client that radically extended C++ to use what amount to F90/APL data-parallel array operations. DMS has been used to carry out other massive metaprogramming tasks on large C++ systems (e.g., application architectural reshaping). (I am the architect behind DMS).
Happy meta-metaprogramming.