How to check if gcc is performing tail-recursion optimization

g++gcctail-recursion

How do I tell if gcc (more specifically, g++) is optimizing tail recursion in a particular function? (Because it's come up a few times: I don't want to test if gcc can optimize tail recursion in general. I want to know if it optimizes my tail recursive function.)

If your answer is "look at the generated assembler", I'd like to know exactly what I'm looking for, and whether or not I could write a simple program that examines the assembler to see if there's optimization.

PS. I know this appears as part of the question Which, if any, C++ compilers do tail-recursion optimization? from 5 months ago. However, I don't think this part of that question was answered satisfactorily. (The answer there was "The easiest way to check if the compiler did the optimization (that I know of) is perform a call that would otherwise result in a stack overflow – or looking at the assembly output.")

Best Answer

Let's use the example code from the other question. Compile it, but tell gcc not to assemble:

gcc -std=c99 -S -O2 test.c

Now let's look at the _atoi function in the resultant test.s file (gcc 4.0.1 on Mac OS 10.5):

        .text
        .align 4,0x90
_atoi:
        pushl   %ebp
        testl   %eax, %eax
        movl    %esp, %ebp
        movl    %eax, %ecx
        je      L3
        .align 4,0x90
L5:
        movzbl  (%ecx), %eax
        testb   %al, %al
        je      L3
        leal    (%edx,%edx,4), %edx
        movsbl  %al,%eax
        incl    %ecx
        leal    -48(%eax,%edx,2), %edx
        jne     L5
        .align 4,0x90
L3:
        leave
        movl    %edx, %eax
        ret

The compiler has performed tail-call optimization on this function. We can tell because there is no call instruction in that code whereas the original C code clearly had a function call. Furthermore, we can see the jne L5 instruction, which jumps backward in the function, indicating a loop when there was clearly no loop in the C code. If you recompile with optimization turned off, you'll see a line that says call _atoi, and you also won't see any backward jumps.

Whether you can automate this is another matter. The specifics of the assembler code will depend on the code you're compiling.

You could discover it programmatically, I think. Make the function print out the current value of the stack pointer (register ESP on x86). If the function prints the same value for the first call as it does for the recursive call, then the compiler has performed the tail-call optimization. This idea requires modifying the function you hope to observe, though, and that might affect how the compiler chooses to optimize the function. If the test succeeds (prints the same ESP value both times), then I think it's reasonable to assume that the optimization would also be performed without your instrumentation, but if the test fails, we won't know whether the failure was due to the addition of the instrumentation code.

Related Solutions

Tail recursion

Consider a simple function that adds the first N natural numbers. (e.g. sum(5) = 0 + 1 + 2 + 3 + 4 + 5 = 15).

Here is a simple JavaScript implementation that uses recursion:

function recsum(x) {
    if (x === 0) {
        return 0;
    } else {
        return x + recsum(x - 1);
    }
}

If you called recsum(5), this is what the JavaScript interpreter would evaluate:

recsum(5)
5 + recsum(4)
5 + (4 + recsum(3))
5 + (4 + (3 + recsum(2)))
5 + (4 + (3 + (2 + recsum(1))))
5 + (4 + (3 + (2 + (1 + recsum(0)))))
5 + (4 + (3 + (2 + (1 + 0))))
5 + (4 + (3 + (2 + 1)))
5 + (4 + (3 + 3))
5 + (4 + 6)
5 + 10
15

Note how every recursive call has to complete before the JavaScript interpreter begins to actually do the work of calculating the sum.

Here's a tail-recursive version of the same function:

function tailrecsum(x, running_total = 0) {
    if (x === 0) {
        return running_total;
    } else {
        return tailrecsum(x - 1, running_total + x);
    }
}

Here's the sequence of events that would occur if you called tailrecsum(5), (which would effectively be tailrecsum(5, 0), because of the default second argument).

tailrecsum(5, 0)
tailrecsum(4, 5)
tailrecsum(3, 9)
tailrecsum(2, 12)
tailrecsum(1, 14)
tailrecsum(0, 15)
15

In the tail-recursive case, with each evaluation of the recursive call, the running_total is updated.

Note: The original answer used examples from Python. These have been changed to JavaScript, since Python interpreters don't support tail call optimization. However, while tail call optimization is part of the ECMAScript 2015 spec, most JavaScript interpreters don't support it.

C++ – Which, if any, C++ compilers do tail-recursion optimization

All current mainstream compilers perform tail call optimisation fairly well (and have done for more than a decade), even for mutually recursive calls such as:

int bar(int, int);

int foo(int n, int acc) {
    return (n == 0) ? acc : bar(n - 1, acc + 2);
}

int bar(int n, int acc) {
    return (n == 0) ? acc : foo(n - 1, acc + 1);
}

Letting the compiler do the optimisation is straightforward: Just switch on optimisation for speed:

For MSVC, use /O2 or /Ox.
For GCC, Clang and ICC, use -O3

An easy way to check if the compiler did the optimisation is to perform a call that would otherwise result in a stack overflow — or looking at the assembly output.

As an interesting historical note, tail call optimisation for C was added to the GCC in the course of a diploma thesis by Mark Probst. The thesis describes some interesting caveats in the implementation. It's worth reading.

Best Answer

Related Solutions

Tail recursion

C++ – Which, if any, C++ compilers do tail-recursion optimization

Related Topic