Do compilers optimise in concurrency

compilerconcurrencyoptimizationparallelism

Presuming that I have written some sequential code where it can be broken down into multiple isolated tasks, it seems it might be efficient for concurrency to be introduced.

For example

print(expensive_calc_one() + expensive_calc_two())

Asuming expensive_calc_one and expensive_calc_two are pure functions but also computationally quite expensive, it would seem sensible for a compiler to optimise the code by introducing concurrency, and allowing the two functions to run in parallel.

I know that this also has its downsides (context switching adds overhead, and some computers still only have one logical core).

Are there any compilers which would introduce concurrency into previously non-concurrent code, and are there any general patterns for it (or reasons not to do it)?

Best Answer

Asuming expensive_calc_one and expensive_calc_two are pure functions

Unfortunately, determining whether a function is pure is equivalent to solving the Halting Problem in the general case. So, you cannot have an Ahead-of-Time compiler which can in the general case decide whether a function is pure or not.

You have to help the compiler by explicitly designing the language in such a way that the compiler has a chance to decide purity. You have to force the programmer to explicitly annotate such functions, for example, or do something like Haskell or Clean, where you clearly isolate side effects using the type system.

but also computationally quite expensive

Unfortunately, determining in an Ahead-of-Time compiler whether a function is "computationally quite expensive" is also equivalent to solving the Halting Problem. So, you would need to force the programmer to explicitly annotate computationally expensive functions for the compiler to parallelize.

Now, if you have to force the programmer to explicitly annotate pure and computationally expensive functions as candidates for parallelization, then is it really automatic parallelization? Where is the difference to simply annotating functions for parallelization?

Note that some of those problems could be addressed by performing the automatic parallelization at runtime. At runtime, you can simply benchmark a function and see how long it runs, for example. Then, the next time it is called, you evaluate it in parallel. (Of course, if the function performs memoization, then your guess will be wrong.)

Are there any compilers which would introduce concurrency into previously non-concurrent code

Not really. Auto-parallelization has been (one of) the holy grail(s) of compiler research for over half a century, and is still as far away today as it was 50–70 years ago. Some compilers perform parallelization at a very small scale, by auto-vectorization, e.g. performing multiple arithmetic operations in parallel by compiling them to vector instructions (MMX/SSE on AMD64, for example). However, this is generally done on a scale of only a handful of instructions, not entire functions.

There are, however, languages where the language constructs themselves have been designed for parallelism. For example, in Fortress, a for loop executes all its iterations in parallel. That means, of course, that you are not allowed to write for loops where different iterations depend on each other. Another example is Go, which has the go keyword for spawning a goroutine.

However, in this case, you either have the programmer explicitly telling the compiler "execute this in parallel", or you have the language explicitly telling the programmer "this language construct will be executed in parallel". So, it's really the same as, say, Java, except it is much better integrated into the language.

But doing it fully automatically, is near impossible, unless the language has been specifically designed with it in mind.

And even if the language is designed for it, you often have the opposite problem now: you have so much parallelism that the scheduling overhead completely dominates the execution time.

As an example: in Excel, (conceptually) all cells are evaluated in parallel. Or more precisely, they are evaluated based on their data dependencies. However, if you were to actually evaluate all formulae in parallel, you would have a massive amount of extremely simple parallel "codelets".

There was, apparently, an experiment in having a Haskell implementation evaluate expressions in parallel. Even though the concurrent abstraction in Haskell (a "spark") is quite lightweight (it is just a pointer to a "thunk", which in turn is just a piece of un-evaluated code), this generated so many sparks that the overhead of managing the sparks overwhelmed the runtime.

When you do something like this, you essentially end up with the opposite problem compared to an imperative language: instead of having a hard time breaking up huge sequential code into smaller parallel bits, you have a hard time combining tiny parallel bits into reasonably-sized sequential bits. While this is semantically easier, because you cannot break code by serializing pure parallel functions, it is still quite hard to get the degree of parallelism and the size of the sequential bits right.

Related Topic