C++ – When is loop unwinding effective

algorithmcoptimizationperformance

Loop unwinding is a common way to help the compiler to optimize performance. I was wondering if and to what extent the performance gain is affected by what is in the body of the loop:

  1. number of statements
  2. number of function calls
  3. use of complex data types, virtual methods, etc.
  4. dynamic (de)allocation of memory

What rules (of thumb?) do you use to decide whether or not to unwind a performance critical loop? What other optimisation do you consider in these cases?

Best Answer

In general unrolling loops by hand is not worth the effort. The compiler knows better how the target architecture works and will unroll the loop if it is beneficial.

There are code-paths that benefit when unrolled for Pentium-M type CPU's but don't benefit for Core2 for example. If I unroll by hand the compiler can't make the decision anymore and I may end up with less than optimal code. E.g. exactly the opposite I tried to achieve.

There are several cases where I do unroll performance critical loops by hand, but I only do this if I know that the compiler will - after manual unrolling - be able to use architectural specific feature such as SSE or MMX instructions. Then, and only then I do it.

Btw - modern CPUs are very efficient at executing well predictable branches. This is exactly what a loop is. The loop overhead is so small these days that it rarely makes a difference. Memory latency effects that may occur due to the increase in code-size will however make a difference.