How should I handle logger failures

design-patternserror handlingexception handlingexceptionslogging

In several of our company's applications, we use a custom logger. It's fairly robust, though we may replace it with something like NLog in the future. One of the logger's tasks is to log any exceptions encountered in the application.

One concern I've always had is that the exception handling within the logger allows for a silent failure. That is, if the log isn't written for a given exception (due to an error in the logger), how should I handle it and (somehow) log the exception in the logger itself?

Let's say the WriteLog function throws an exception. Should I try to call the function some number of times or until the exception isn't thrown? Should I try to write the thrown exception with the logger (which would likely just result in exceptions all the way down. . .)? I have been lucky enough to not encounter this situation except when we were first implementing the custom logger. On the other hand, I have no way of knowing at the moment if the logger has failed to log application exceptions (due to its own exceptions).

I have tried searching online and on some SE sites, but it's been fruitless so far since all the posts deal with errors in a logger (but not potential exceptions and how to log them) or with exceptions outside the logger.

Best Answer

When you encounter exceptions within the logger itself, you shouldn't use the logger to log its own exceptions. The reason for that is that:

  • You may be stuck in an infinite loop. Imagine that within your logger, you have a conditional branch which wasn't tested (and generates an exception). Imagine that once the condition is met, any further reported exception is handled by the same branch. This means that from the moment the branch is executed, you're in an infinite loop.

  • You may be stuck in a temporary loop, generating thousands of exceptions per second. Imagine you're reporting exceptions to a remote server. An issue with the server causes another exception, which causes another one, and so on, until the connection is back.

What you should do instead is to fallback to a safer way to log the exceptions. For example, if your logger sends the exceptions to a remote server, send the exceptions within the logger to syslog instead. If your logger records exceptions in Windows Events and this action fails, store the failure exception in a simple text file.

Once you have that, the next question is how do you know that those exceptions occurred: if you have dozens of applications running on thousands of servers, you can't possibly SSH each of them on regular basis to check whether they were logging something locally.

One way is to have a cron job which checks for those “exceptional logs” and pushes them to the location where other exceptions are stored (eventually using your logger, but beware of infinite or temporary loops!).