C++ Debugging – Is It Possible to Write Too Many Asserts?

assertionscdebuggingerror handling

I am a big fan of writing assert checks in C++ code as a way to catch cases during development that cannot possibly happen but do happen because of logic bugs in my program. This is a good practice in general.

However, I've noticed that some functions I write (which are part of a complex class) have 5+ asserts which feels like it could potentially be a bad programming practice, in terms of readability and maintainability. I think it's still great, as each one requires me to think about pre- and post-conditions of functions and they really do help catch bugs. However, I just wanted to put this out there to ask if there is a better paradigms for catching logic errors in cases when a large number of checks is necessary.

Emacs comment: Since Emacs is my IDE of choice, I have it slightly gray out the assert statements which helps reduce the feeling of clutter that they can provide. Here's what I add to my .emacs file:

; gray out the "assert(...)" wrapper
(add-hook 'c-mode-common-hook
  (lambda () (font-lock-add-keywords nil
    '(("\\<\\(assert\(.*\);\\)" 1 '(:foreground "#444444") t)))))

; gray out the stuff inside parenthesis with a slightly lighter color
(add-hook 'c-mode-common-hook
  (lambda () (font-lock-add-keywords nil
    '(("\\<assert\\(\(.*\);\\)" 1 '(:foreground "#666666") t)))))

Best Answer

I've seen hundreds of bugs that would have been solved faster if someone had written more asserts, and not a single one that would have been solved quicker by writing fewer.

[C]ould [too many asserts] potentially be a bad programming practice, in terms of readability and maintainability[?]

Readability could be a problem, perhaps - although it's been my experience that people who write good asserts also write readable code. And it never bothers me to see the beginning of a function start with a block of asserts to verify that the arguments aren't garbage - just put a blank line after it.

Also in my experience, maintainability is always improved by asserts, just as it is by unit tests. Asserts provide a sanity check that code is being used the way it was intended to be used.

Related Solutions

Unit-testing – Are asserts or unit tests more important

Asserts are useful for telling you about the internal state of the program. For example, that your data structures have a valid state, e.g., that a Time data structure won't hold the value of 25:61:61. The conditions checked by asserts are:

Preconditions, which assure that the caller keeps its contract,
Postconditions, which assure that the callee keeps its contract, and
Invariants, which assure that the data structure always holds some property after the function returns. An invariant is a condition that is a precondition and a postcondition.

Unit tests are useful for telling you about the external behavior of the module. Your Stack may have a consistent state after the push() method is called, but if the size of the stack doesn't increase by three after it is called three times, then that is an error. (For example, the trivial case where the incorrect push() implementation only checks the asserts and exits.)

Strictly speaking, the major difference between asserts and unit tests is that unit tests have test data (values to get the program to run), while asserts do not. That is, you can execute your unit tests automatically, while you cannot say the same for assertions. For the sake of this discussion I've assumed that you are talking about executing the program in the context of higher-order function tests (which execute the whole program, and do not drive modules like unit tests). If you are not talking about automated function tests as the means to "see real inputs", then clearly the value lies in automation, and thus the unit tests would win. If you are talking about this in the context of (automated) function tests, then see below.

There can be some overlap in what is being tested. For example, a Stack's postcondition may actually assert that the stack size increases by one. But there are limits to what can be performed in that assert: Should it also check that the top element is what was just added?

For both, the goal is to increase quality. For unit testing, the goal is to find bugs. For assertions, the goal is to make debugging easier by observing invalid program states as soon as they occur.

Note that neither technique verifies correctness. In fact, if you conduct unit testing with the goal to verify the program is correct, you will likely come up with uninteresting test that you know will work. It's a psychological effect: you'll do whatever it is to meet your goal. If your goal is to find bugs, your activities will reflect that.

Both are important, and have their own purposes.

[As a final note about assertions: To get the most value, you need to use them at all critical points in your program, and not a few key functions. Otherwise, the original source of the problem might have been masked and hard to detect without hours of debugging.]

C++ Exceptions – Using Exceptions as Asserts or Errors

C++ seems to prefer using exceptions more often.

I would suggest actually less than Objective-C in some respects because the C++ standard library would not generally throw on programmer errors like out-of-bounds access of a random-access sequence in its most common case design form (in operator[], i.e.) or trying to dereference an invalid iterator. The language doesn't throw on accessing an array out of bounds, or dereferencing a null pointer, or anything of this sort.

Taking programmer mistakes largely out of the exception-handling equation actually takes away a very large category of errors that other languages often respond to by throwing. C++ tends to assert (which doesn't get compiled in release/production builds, only debug builds) or just glitch out (often crashing) in such cases, probably in part because the language doesn't want to impose the cost of such runtime checks as would be required to detect such programmer mistakes unless the programmer specifically wants to pay the costs by writing code that performs such checks himself/herself.

Sutter even encourages avoiding exceptions in such cases in C++ Coding Standards:

The primary disadvantage of using an exception to report a programming error is that you don't really want stack unwinding to occur when you want the debugger to launch on the exact line where the violation was detected, with the line's state intact. In sum: There are errors that you know might happen (see Items 69 to 75). For everything else that shouldn't, and it's the programmer's fault if it does, there is assert.

That rule isn't necessarily set in stone. In some more mission-critical cases, it might be preferable to use, say, wrappers and a coding standard which uniformly logs where programmer mistakes occur and throw in the presence of programmer mistakes like trying to deference something invalid or access it out of bounds, because it might be too costly to fail to recover in those cases if the software has a chance. But overall the more common use of the language tends to favor not throwing in the face of programmer mistakes.

External Exceptions

Where I see exceptions encouraged most often in C++ (according to standard committee, e.g.) is for "external exceptions", as in an unexpected result in some external source outside the program. An example is failing to allocate memory. Another is failing to open a critical file required for the software to run. Another is failing to connect to a required server. Another is a user jamming an abort button to cancel an operation whose common case execution path expects to succeed absent this external interruption. All of these things are outside of the control of the immediate software and the programmers who wrote it. They're unexpected results from external sources that prevent the operation (which should really be thought of as an indivisible transaction in my book*) from being able to succeed.

Transactions

I often encourage looking at a try block as a "transaction" because transactions should succeed as a whole or fail as a whole. If we're trying to do something and it fails halfway through, then any side effects/mutations made to the program state generally need to be rolled back to put the system back into a valid state as though the transaction was never executed at all, just as an RDBMS which fails to process a query halfway through should not compromise the integrity of the database. If you mutate program state directly in said transaction, then you must "unmutate" it on encountering an error (and here scope guards can be useful with RAII).

The much simpler alternative is don't mutate the original program state; you might mutate a copy of it and then, if it succeeds, swap the copy with the original (ensuring the swap cannot throw). If it fails, discard the copy. This also applies even if you don't use exceptions for error handling in general. A "transactional" mindset is key to proper recovery if program state mutations have occurred prior to encountering an error. It either succeeds as a whole or fails as whole. It does not halfway succeed in making its mutations.

This is bizarrely one of the least frequently discussed topics when I see programmers asking about how to properly do error or exception handling, yet it is the most difficult of them all to get right in any software that wants to directly mutate program state in many of its operations. Purity and immutability can help here to achieve exception-safety just as much as they help with thread-safety, as a mutation/external side effect which does not occur need not be rolled back.

Performance

Another guiding factor in whether or not to use exceptions is performance, and I don't mean in some obsessive, penny-pinching, counter-productive way. A lot of C++ compilers implement what's called "Zero-Cost Exception Handling".

It offers zero runtime overhead for an error-free execution, which surpasses even that of C return-value error handling. As a trade-off, the propagation of an exception has a large overhead.

According to what I've read about it, it makes your common case execution paths require no overhead (not even the overhead that normally accompanies C-style error code handling and propagation), in exchange for heavily skewing the costs towards the exceptional paths (which means throwing is now more expensive than ever).

"Expensive" is a bit hard to quantify but, for starters, you probably don't want to be throwing a million times in some tight loop. This kind of design assumes that exceptions aren't occurring left and right all the time.

Non-Errors

And that performance point brings me to non-errors, which is surprisingly fuzzy if we look at all sorts of other languages. But I would say, given the zero-cost EH design mentioned above, that you almost certainly do not want to throw in response to a key not being found in a set. Because not only is that arguably a non-error (the person searching for the key might have built the set and expect to be searching for keys that don't always exist), but it would be enormously expensive in that context.

For example, a set intersection function might want to loop through two sets and search for keys they have in common. If failing to find a key threw, you'd be looping through and might be encountering exceptions in half or more of the iterations:

Set<int> set_intersection(const Set<int>& a, const Set<int>& b)
{
     Set<int> intersection;
     for (int key: a)
     {
          try
          {
              b.find(key);
              intersection.insert(other_key);
          }
          catch (const KeyNotFoundException&)
          {
              // Do nothing.
          }
     }
     return intersection;
}

That above example is absolutely ridiculous and exaggerated, but I have seen, in production code, some people coming from other languages using exceptions in C++ somewhat like this, and I think it's a reasonably practical statement that this is not an appropriate use of exceptions whatsoever in C++. Another hint above is that you'll notice the catch block has absolutely nothing to do and is just written to forcibly ignore any such exceptions, and that's usually a hint (though not a guarantor) that exceptions are probably not being used very appropriately in C++.

For those types of cases, some type of return value indicating failure (anything from returning false to an invalid iterator or nullptr or whatever makes sense in the context) is usually far more appropriate, and also often more practical and productive since a non-error type of case usually doesn't call for some stack unwinding process to reach the analogical catch site.

Questions

I'd have to go with internal error flags if I choose to avoid exceptions. Will it be too much bother to handle, or will it perhaps work even better than exceptions? A comparison of both cases would be the best answer.

Avoiding exceptions outright in C++ seems extremely counter-productive to me, unless you're working in some embedded system or a particular type of case which forbids their use (in which case you'd also have to go out of your way to avoid all library and language functionality that would otherwise throw, like strictly using nothrow new).

If you absolutely have to avoid exceptions for whatever reason (ex: working across C API boundaries of a module whose C API you export), many might disagree with me but I'd actually suggest using a global error handler/status like OpenGL with glGetError(). You can make it use thread-local storage to have a unique error status per thread.

My rationale for that is that I'm not used to seeing teams in production environments thoroughly check for all possible errors, unfortunately, when error codes are returned. If they were thorough, some C APIs can encounter an error with just about every single C API call, and thorough checking would require something like:

if ((err = ApiCall(...)) != success)
{
     // Handle error
}

... with almost every single line of code invoking the API requiring such checks. Yet I've not had the fortune of working with teams that thorough. They often ignore such errors half, sometimes even most, of the time. That's the biggest appeal to me of exceptions. If we wrap this API and make it uniformly throw on encountering an error, the exception cannot possibly be ignored, and in my view, and experience, that is where the superiority of exceptions lie.

But if exceptions cannot be used, then the global, per-thread error status at least has the advantage (a huge one compared to returning error codes to me) that it might have a chance to catch a former error a bit later than when it occurred in some sloppy codebase instead of outright missing it and leaving us completely oblivious about what happened. The error might have occurred a few lines before, or in a previous function call, but provided the software hasn't crashed yet, we might be able to start working our way backwards and figuring out where and why it occurred.

It seems to me that since pointers are rare, I'd have to go with internal error flags if I choose to avoid exceptions.

I wouldn't necessarily say pointers are rare. There are even methods now in C++11 and onwards to get at the underlying data pointers of containers, and a new nullptr keyword. It's generally considered unwise to use raw pointers to own/manage memory if you can use something like unique_ptr instead given how critical it is to be RAII-conforming in the presence of exceptions. But raw pointers that don't own/manage memory aren't necessarily considered so bad (even from people like Sutter and Stroustrup) and sometimes very practical as a way to point to things (along with indices that point to things).

They're arguably no less safe than the standard container iterators (at least in release, absent checked iterators) which will not detect if you try to dereference them after they're invalidated. C++ is still unashamedly a bit of a dangerous language, I'd say, unless your specific use of it wants to wrap everything and hide even non-owning raw pointers away. It is almost critical with exceptions that resources conform to RAII (which generally comes at no runtime cost), but other than that it's not necessarily trying to be the safest language to use in favor of avoiding costs that a developer doesn't explicitly want in exchange for something else. The recommended use isn't trying to protect you from things like dangling pointers and invalidated iterators, so to speak (otherwise we'd be encouraged to use shared_ptr all over the place, which Stroustrup vehemently opposes). It's trying to protect you from failing to properly free/release/destroy/unlock/clean up a resource when something throws.

Best Answer

Related Solutions

Unit-testing – Are asserts or unit tests more important

C++ Exceptions – Using Exceptions as Asserts or Errors

Related Topic