Now, I don't know much about Clojure and little about Scala, but I'll give it a shot.
First off, we need to differentiate between tail-CALLs and tail-RECURSION. Tail recursion is indeed rather easy to transform into a loop. With tail calls, it's much harder to impossible in the general case. You need to know what is being called, but with polymorphism and/or first-class functions, you rarely know that, so the compiler cannot know how to replace the call. Only at runtime you know the target code and can jump there without allocating another stack frame. For instance, the following fragment has a tail call and does not need any stack space when properly optimized (including TCO), yet it cannot be eliminated when compiling for the JVM:
function forward(obj: Callable<int, int>, arg: int) =
let arg1 <- arg + 1 in obj.call(arg1)
While it's just a tad inefficient here, there are whole programming styles (such as Continuation Passing Style or CPS) which have tons of tail calls and rarely ever return. Doing that without full TCO means you can only run tiny bits of code before running out of stack space.
What facility of the underlying virtual machine would allow the compiler to handle TCO more easily?
A tail call instruction, such as in the Lua 5.1 VM. Your example does not get much simpler. Mine becomes something like this:
push arg
push 1
add
load obj
tailcall Callable.call
// implicit return; stack frame was recycled
As a sidenote, I would not expect actual machines to be much smarter than the JVM.
You're right, they aren't. In fact, they are less smart and thus don't even know (much) about things like stack frames. That's precisely why one can pull tricks like re-using stack space and jumping to code without pushing a return address.
You don't have one tree with multiple roots, you have a grab bag of separate trees, each with a unique root. In graph theory, a bunch of disconnected trees are called forest. But if the trees don't really belong together, it might be more useful to think of it just as a collection (list, map, etc.) of trees.
Best Answer
Turns out Martin answers the exact question 15 minutes later on in the same video! Yes, short-circuit evaluation is implemented using call by-name args. Martin discusses just this and defines the
and
function as follows:Note that if the second arg y was not call-by-name, then
and(false,<infinite loop expr>)
would not terminate, whereas we would expect it to return false due to short-circuit eval.