When there’s no TCO, when to worry about blowing the stack

recursionstackstackoverflowtail-call

Every single time there's a discussion about a new programming language targetting the JVM, there are inevitably people saying things like:

"The JVM doesn't support tail-call optimization, so I predict lots of exploding stacks"

There are thousands of variations on that theme.

Now I know that some language, like Clojure for example, have a special recur construct that you can use.

What I don't understand is: how serious is the lack of tail-call optimization? When should I worry about it?

My main source of confusion probably comes from the fact that Java is one of the most succesful languages ever and quite a few of the JVM languages seems to be doing fairly well. How is that possible if the lack of TCO is really of any concern?

Best Answer

Consider this, let's say we got rid of all loops in Java (the compiler writers are on strike or something). Now we want to write factorial, so we might right something like this

int factorial(int i){ return factorial(i, 1);}
int factorial(int i, int accum){
  if(i == 0) return accum;
  return factorial(i-1, accum * i);
}

Now we're feeling pretty clever, we've managed to write our factorial even without loops! But when we test, we notice that with any reasonably sized number, we're getting stackoverflow errors since there's no TCO.

In real Java this isn't a problem. If we ever have a tail recursive algorithm, we can transform it into a loop and be just fine. However, what about languages with no loops? Then you're just hosed. That's why clojure has this recur form, without it, it's not even turing complete (No way to do infinite loops).

The class of functional languages that target the JVM, Frege, Kawa (Scheme), Clojure are always trying to deal with the lack of tail calls, because in these languages, TC is the idiomatic way of doing loops! If translated to Scheme, that factorial above would be a good factorial. It'd be awfully inconvenient if looping 5000 times made your program crash. This can be worked around though, with recur special forms, annotations hinting at optimizing self calls, trampolining, whatever. But they all force either performance hits or unnecessary work on the programmer.

Now Java doesn't get off free either, since there's more to TCO then just recursion, what about mutually recursive functions? They can't be straightforwardly translated to loops, but are still unoptimized by the JVM. This makes it spectacularly unpleasant to try to write algorithms using mutual recursion using Java since if you want decent performance/range you have to do dark magic to get it to fit into loops.

So, in summary, this isn't a huge deal for many cases. Most tail calls either only proceed one stackframe deep, with things like

return foo(bar, baz); // foo is just a simple method

or are recursion. However, for the class of TC that don't fit into this, every JVM language feels the pain.

However, there is a decent reason why we don't yet have TCO. The JVM gives us stack traces. With TCO we systematically eliminate stackframes that we know are "doomed", but the JVM might actually want these later for a stacktrace! Say we implement a FSM like this, where each state tail-calls the next. We'd erase all record of previous states so a traceback would show us what state, but not anything about how we got there.

Additionally, and more pressingly, much of bytecode verification is stack based, eliminating the thing that lets us verify bytecode is not pleasant prospect. Between this and the fact that Java has loops, TCO looks like a bit more trouble than it's worth to the JVM engineers.

Related Solutions

Why Does the Stack Grow Downward?

I believe it comes from the very early days of computing, when memory was very limited, and it was not wise to pre-allocate a large chunk of memory for exclusive use by the stack. So, by allocating heap memory from address zero upwards, and stack memory from the end of the memory downwards, you could have both the heap and the stack share the same area of memory.

If you needed a bit more heap, you could be careful with your stack usage; if you needed more stack, you could try to free some heap memory. The result was, of course, mostly, spectacular crashes, as the stack would occasionally overwrite the heap and vice versa.

Back in those days there were no interwebz, so there was no issue of buffer overrun exploitations. (Or at least to the extent that the interwebz existed, it was all within high security facilities of the united states department of defense, so the possibility of malicious data did not need to be given much thought.)

After that, with most architectures it was all a matter of maintaining compatibility with previous versions of the same architecture. That's why upside-down stacks are still with us today.

JVM Tail-Call Optimization – Limitations in Scala and Clojure

Now, I don't know much about Clojure and little about Scala, but I'll give it a shot.

First off, we need to differentiate between tail-CALLs and tail-RECURSION. Tail recursion is indeed rather easy to transform into a loop. With tail calls, it's much harder to impossible in the general case. You need to know what is being called, but with polymorphism and/or first-class functions, you rarely know that, so the compiler cannot know how to replace the call. Only at runtime you know the target code and can jump there without allocating another stack frame. For instance, the following fragment has a tail call and does not need any stack space when properly optimized (including TCO), yet it cannot be eliminated when compiling for the JVM:

function forward(obj: Callable<int, int>, arg: int) =
    let arg1 <- arg + 1 in obj.call(arg1)

While it's just a tad inefficient here, there are whole programming styles (such as Continuation Passing Style or CPS) which have tons of tail calls and rarely ever return. Doing that without full TCO means you can only run tiny bits of code before running out of stack space.

What facility of the underlying virtual machine would allow the compiler to handle TCO more easily?

A tail call instruction, such as in the Lua 5.1 VM. Your example does not get much simpler. Mine becomes something like this:

push arg
push 1
add
load obj
tailcall Callable.call
// implicit return; stack frame was recycled

As a sidenote, I would not expect actual machines to be much smarter than the JVM.

You're right, they aren't. In fact, they are less smart and thus don't even know (much) about things like stack frames. That's precisely why one can pull tricks like re-using stack space and jumping to code without pushing a return address.

Best Answer

Related Solutions

Why Does the Stack Grow Downward?

JVM Tail-Call Optimization – Limitations in Scala and Clojure

Related Topic