Java – Is it Ever Okay to Catch StackOverflowError?

javajvmstackoverflow

I used to think that it's not, but yesterday I had to do it. It's an application that uses Akka (an actor system implementation for the JVM) to process asynchronous jobs. One of the actors performs some PDF manipulation, and because the library is buggy, it dies with a StackOverflowError every now and then.

The second aspect is that Akka is configured to shutdown its whole actor system if any JVM fatal error (e.g. StackOverflowError) is caught.

The third aspect is that this actor system is embedded inside a web app (for WTF-ish, legacy, reasons), so when the actor system is shut down, the web app is not. The net effect is that on a StackOverflowError our job processing application becomes just an empty web app.

As a quick fix I had to catch the StackOverflowError being thrown, so that the thread pool of the actor system isn't torn down. This lead me to think that maybe it's sometimes okay to catch such errors especially in contexts like this? When there's a thread pool processing arbitrary tasks? Unlike an OutOfMemoryError I can't imagine how a StackOverflowError can leave an application in an inconsistent state. The stack is cleared after such an error, so computation can go on normally. But maybe I'm missing something important.

Also, let it be noted that I'm all for fixing the error in the first place (as a matter of fact I have already fixed an SOE in this same app a few days ago), but I really don't know when this kind of situation might arise.

Why would it be better to restart the JVM process instead of catching the StackOverflowError, mark that job as failed, and continue with my business?

Is there any compelling reason to never catch SOEs? Except "best practices", which is a vague term that tells me nothing.

Best Answer

As a general rule, if it were absolutely never, ever acceptable to do something, and there was agreement about that, the language implementers would not have allowed it. There are almost no such unanimously clear-cut maxims. (Luckily, because that's what keeps us human programmers in jobs!)

It looks very much as if you've found a situation where catching this error is the best option for you: it lets your application work, while all other alternatives don't, and that's what counts in the end. All "best practices" are simply summations of long experiences with many cases that can usually be used in place of a detailed analysis of a specific case to save time; in your case, you've already done the specific analysis and got a different result. Congratulations, you're capable of independent thought!

(That said, surely there are situations where a stack overflow might leave an application inconsistent just like a memory exhaustion. Just imagine that some object is constructed and then initialized with the help of nested internal method calls - if one of them throws, the object may very well be in a state not supposed to be possible, just as if an allocation had failed. But that doesn't mean that your solution couldn't still be the best one.)