C++ – Doesn’t “always initialize variables” lead to important bugs being hidden

c

The C++ Core Guidelines have the rule ES.20: Always initialize an object.

Avoid used-before-set errors and their associated undefined behavior. Avoid problems with comprehension of complex initialization. Simplify refactoring.

But this rule doesn't help to find bugs, it only hides them.
Let's suppose that a program has an execution path where it uses an uninitialized variable. It is a bug. Undefined behavior aside, it also means that something went wrong, and the program probably doesn't meet its product requirements. When it will be deployed to production, there can be a money loss, or even worse.

How do we screen bugs? We write tests. But tests don't cover 100% of execution paths, and tests never cover 100% of program inputs. More than that, even a test covers a faulty execution path – it still can pass. It's undefined behavior after all, an uninitialized variable can have a somewhat valid value.

But in addition to our tests, we have the compilers which can write something like 0xCDCDCDCD to uninitialized variables. This slightly improves detection rate of the tests.
Even better – there are tools like Address Sanitizer, which will catch all the reads of uninitialized memory bytes.

And finally there are static analyzers, which can look at the program and tell that there is a read-before-set on that execution path.

So we have many powerful tools, but if we initialize the variable – sanitizers find nothing.

int bytes_read = 0;
my_read(buffer, &bytes_read); // err_t my_read(buffer_t, int*);
// bytes_read is not changed on read error.
// It's a bug of "my_read", but detection is suppressed by initialization.
buffer.shrink(bytes_read); // Uninitialized bytes_read could be detected here.

// Another bug: use empty buffer after read error.
use(buffer);

There is another rule – if program execution encounters a bug, program should die as soon as possible. No need to keep it alive, just crash, write a crashdump, give it to the engineers for investigation.
Initializing variables unnecessarily does the opposite – program is being kept alive, when it would already get a segmentation fault otherwise.

Best Answer

Your reasoning goes wrong on several accounts:

  1. Segmentation faults are far from certain to occur. Using an uninitialized variable results in undefined behaviour. Segmentation faults are one way that such behaviour can manifest itself, but appearing to run normal is just as likely.
  2. Compilers never fill the uninitialized memory with a defined pattern (like 0xCD). This is something that some debuggers do to assist you in finding places where uninitialized variables get used. If you run such a program outside a debugger, then the variable will contain completely random garbage. It is equally likely that a counter like the bytes_read has the value 10 as that it has the value 0xcdcdcdcd.
  3. Even if you are running in a debugger that sets the uninitialized memory to a fixed pattern, they only do so at startup. This means that this mechanism only works reliably for static (and possibly heap-allocated) variables. For automatic variables, which get allocated on the stack or live only in a register, the chances are high that the variable is stored in a location that was used before, so the tell-tale memory pattern has already been overwritten.

The idea behind the guidance to always initialize variables is to enable these two situations

  1. The variable contains a useful value right from the very beginning of its existence. If you combine that with the guidance to declare a variable only once you need it, you can avoid future maintenance programmers falling in the trap of starting to use a variable between its declaration and the first assignment, where the variable would exist but be uninitialized.

  2. The variable contains a defined value that you can test for later, to tell if a function like my_read has updated the value. Without initialization, you can't tell if bytes_read actually has a valid value, because you can't know what value it started with.