‘boneheaded’ exceptions not be caught, especially in server code

anti-patternsantibuggingdesign-patternsexceptionsweb services

I am confused because in quite a few places I've already read that the so-called 'boneheaded' exceptions (ones that result from bugs in code) are not supposed to be caught. Instead, they must be allowed to crash the application:

At least two of the three above people are established authorities.

I am surprised. Especially for some (important!) use cases, like server side code, I simply can't see why is catching such an exception suboptimal and why the application must be allowed to crash.

As far as I'm aware, the typical solution in such a case is to catch the exception, return HTTP 500 to the client, have an automatic system that sends an emergency e-mail to the development team so that they can fix the problem ASAP – but do not crash the application (one request must fail, there's nothing we can do here, but why take the whole service down and make everyone else unable to use our website? Downtime is costly!). Am I incorrect?

Why am I asking – I'm perpetually trying to finish a hobby project, which is a browser based game in .net core. As far as I'm aware, in many cases the framework does for me out of the box the precise thing Eric Lippert and Stephen Cleary are recommending against! – that is, if handling a request throws, the framework automatically catches the exception and prevents the server from crashing. In a few places, however, the framework does not do this. In such places, I am wrapping my own code with try {...} catch {...} to catch all possible 'boneheaded' exceptions.

One of such places, AFAIK, is background tasks. For example, I am now implementing a background ban clearing service that is supposed to clear all expired temporary bans every few minutes. Here, I'm even using a few layers of all-catching try blocks:

try // prevent server from crashing if boneheaded exception occurs here
{
    var expiredBans = GetExpiredBans();
    foreach(var ban in expiredBans)
    {
        try // If removing one ban fails, eg because of a boneheaded problem, 
        {   // still try to remove other bans
            RemoveBan(ban);
        }
        catch
        {

        }
    }
}
catch
{

}

(Yes, my catch blocks are empty right now – I am aware that ignoring these exceptions is unacceptable, adding some logging is perpetually on my TODO list)

Having read the articles I linked to above, I can no longer continue doing this without some serious doubt… Am I not shooting myself in the foot? Why / Why not?

If and why should boneheaded exceptions never be caught?

Best Answer

Silent But Deadly

When writing enterprise software, you will eventually learn an essential truth: the worst bug in the world is not one that causes your program to crash. The worst bug in the world is one which causes your program to silently produce a wrong answer that goes unnoticed but eventually produces a massive negative effect (with severe financial implications for your employer). Thus, error messages and crashes are A Good ThingTM, because they indicate that your program detected a problem.

Amazing Grace

Now, this seems to conflict with another enterprise virtue, which is "degrade gracefully". Blowing up and not returning any response at all hardly looks like "graceful degradation". And this is why many folks will try very hard to return some response, if they can. Indeed, this is why many frameworks, like Spring, will catch all top-level exceptions and wrap them with a 500 response, as you describe. In general, I think this is OK. After all, most exceptions don't really require a restart of the entire app server if you can just kill/restart a server thread. A sane framework will be careful to not catch Java Errors, like OutOfMemory, for obvious reasons.

But there is one more point to consider: once you get beyond a single server, you will likely have a load balancer in front of your service. And when the LB times out or gets a closed connection, it will generally return a 500 to its client. Thus, the LB will often transform your "server crash" into a client 5xx automatically! Best of both worlds.

Worst Case

In your scenario, what is the worst that can happen if you don't catch the exceptions? Your answer: "Well, my game server dies, and nobody can play!!!" But that's not the worst case. The worst case is, everyone is playing your game, but griefers are ruining it. Players file a bug report and tell you that bans aren't working, but you look at the logs and everything looks fine. Or, legitimate players are getting banned by griefers, and instead of being able to rejoin in a timely manner, the bans are lasting indefinitely, because your server happily ignores failures. The worst thing isn't your game crashing. It's your player trust crashing. Good luck trying to reset that.

Related Topic