JVM Tail-Call Optimization – Limitations in Scala and Clojure

clojurejvmscalatail-call

Clojure does not perform tail call optimization on its own: when you have a tail recursive function and you want to have it optimized, you have to use the special form recur. Similarly, if you have two mutually recursive functions, you can optimize them only by using trampoline.

The Scala compiler is able to perform TCO for a recursive function, but not for two mutually recursive functions.

Whenever I have read about these limitations, they were always ascribed to some limitation intrinsic to the JVM model. I know pretty much nothing about compilers, but this puzzles me a bit. Let me take the example from Programming Scala. Here the function

def approximate(guess: Double): Double =
  if (isGoodEnough(guess)) guess
  else approximate(improve(guess))

is translated into

0: aload_0
1: astore_3
2: aload_0
3: dload_1
4: invokevirtual #24; //Method isGoodEnough:(D)Z
7: ifeq
10: dload_1
11: dreturn
12: aload_0
13: dload_1
14: invokevirtual #27; //Method improve:(D)D
17: dstore_1
18: goto 2

So, at the bytecode level, one just needs goto. In this case, in fact, the hard work is done by the compiler.

What facility of the underlying virtual machine would allow the compiler to handle TCO more easily?

As a side note, I would not expect actual machines to be much smarter than the JVM. Still, many languages that compile to native code, such as Haskell, do not seem to have issues with optimizing tail calls (well, Haskell can have sometimes due to laziness, but that is another issue).

Best Answer

Now, I don't know much about Clojure and little about Scala, but I'll give it a shot.

First off, we need to differentiate between tail-CALLs and tail-RECURSION. Tail recursion is indeed rather easy to transform into a loop. With tail calls, it's much harder to impossible in the general case. You need to know what is being called, but with polymorphism and/or first-class functions, you rarely know that, so the compiler cannot know how to replace the call. Only at runtime you know the target code and can jump there without allocating another stack frame. For instance, the following fragment has a tail call and does not need any stack space when properly optimized (including TCO), yet it cannot be eliminated when compiling for the JVM:

function forward(obj: Callable<int, int>, arg: int) =
    let arg1 <- arg + 1 in obj.call(arg1)

While it's just a tad inefficient here, there are whole programming styles (such as Continuation Passing Style or CPS) which have tons of tail calls and rarely ever return. Doing that without full TCO means you can only run tiny bits of code before running out of stack space.

What facility of the underlying virtual machine would allow the compiler to handle TCO more easily?

A tail call instruction, such as in the Lua 5.1 VM. Your example does not get much simpler. Mine becomes something like this:

push arg
push 1
add
load obj
tailcall Callable.call
// implicit return; stack frame was recycled

As a sidenote, I would not expect actual machines to be much smarter than the JVM.

You're right, they aren't. In fact, they are less smart and thus don't even know (much) about things like stack frames. That's precisely why one can pull tricks like re-using stack space and jumping to code without pushing a return address.

Related Topic