Whole-Program Optimization Techniques in Compilers

compileroptimization

I just asked about the Polyhedral Model, and have looked into other compiler optimizations (Loop unrolling, Constant folding and propagation, Dead code elimination etc.).

But I haven't seen anything on entire program transformation/optimization. Wondering if there is anything on that topic, any materials or techniques. An example would be taking your whole program and reordering the function calls and variable uses at the scale of the whole program to make it more optimized (as opposed to at the scale of individual loops or basic blocks).

I'm specifically wondering how this optimizer would know to replace a high-level function (referencing multiple other functions/high-level functions) could be replaced by another function. Same with a set of these functions. The optimizer would have to somehow know the intent of the programmer it seems, but who knows maybe they've figured out a way.

Best Answer

Whole program optimization includes (practically, at least for C or C++ and similar languages) inlining across translation units, so is sometimes (improperly) called link-time optimization (LTO), but still is done by the compiler also running during the linking step. BTW, LTO existed at least since the 1990s (and probably even in the mainframe time, 1970s).

In practice, recent compilers such as GCC or Clang are able to do that, e.g. with the -flto optimization flag (to be passed, with some other flag like -O2, both at compile and at link time to gcc, clang, g++, clang++ etc...). That works essentially by almost duplicating the optimization effort: for every translation unit, the internal representation of the source code (e.g. some kind of GIMPLE for GCC) is also generated in the object files. At "link" time all these representations are optimized together again. So the overall compilation time (including "link-time optimization" which happens in the compiler...) is nearly doubled in practice.

However, link-time optimization is usually not worth the effort, since in practice you double the build time to gain only a few percents of performance at runtime of the compiled program (of course, there are exceptions to this rule-of-thumb observation).

Compilers for languages having an explicit notion of modules (and reifying their representation in some kind of data), or for homoiconic languages, may also whole program optimize in a different way (by still having a representation of the code of the entire program). Look into Stalin or PolyML (or even Ocaml or Go or SBCL sometimes) or CAIA or SELF for examples. Even the C++20 standard should have modules.

(I don't understand why researchers named their prototype software "Stalin". That name is so disgusting to me that I am psychologically unable to try that compiler. For future academics: please name your software carefully!)