Do compilers like Javac automatically detect pure functions and parallelize them

functional programming

Pure functions are known to facilitate parellelizing.
What is it about functional programming that makes it inherently adapted to parallel execution?

Are compilers such as Javac smart enough to detect when a method is a pure function? One can always implement classes which implement functional interfaces such as Function,
but have side effects.

Best Answer

are compilers such as Javac smart enough to detect when a method is a pure function.

It's not a question of "smart enough". This is called Purity Analysis and is provably impossible in the general case: it is equivalent to solving the Halting Problem.

Now, of course, optimizers do provably impossible things all the time, "provably impossible in the general case" doesn't mean that it never works, it only means that it cannot work in all cases. So, there are in fact algorithms to check whether a function is pure or not, it's just that more often than not the result will be "I don't know", which means that for reasons of safety and correctness, you need to assume that this particular function might be impure.

And even in the cases where it does work, the algorithms are complex and expensive.

So, that is Problem #1: it only works for special cases.

Problem #2: Libraries. In order for a function to be pure, it can only ever call pure functions (and those functions can only call pure functions, and so on and so forth). Javac obviously only knows about Java, and it only knows about code it can see. So, if your function calls a function in another compilation unit, you cannot know whether it is pure or not. If it calls a function written in another language, you can't know. If it calls a function in a library which might not even be installed yet, you can't know. And so on.

This only works, when you have whole-program analysis, when the entire program is written in the same language, and all is compiled at once in one go. You can't use any libraries.

Problem #3: Scheduling. Once you have figured out which parts are pure, you still have to schedule them to separate threads. Or not. Starting and stopping threads is very expensive (especially in Java). Even if you keep a thread pool and don't start or stop them, thread context switching is also expensive. You need to be sure that the computation will run significantly longer than the time it takes to schedule and context switch, otherwise you will lose performance, not gain it.

As you probably guessed by now, figuring out how long a computation will take is provably impossible in the general case (we cannot even figure out whether it will take a finite amount of time, let alone how much time) and hard and expensive even in the special case.

Aside: Javac and optimizations. Note that most implementations of javac don't actually perform many optimizations. Oracle's implementation of javac, for example, relies on the underlying execution engine to do optimizations. This leads to another set of problems: say, javac decided that a particular function is pure and it is expensive enough, and so it compiles it to be executed on a different thread. Then, the platform's optimizer (for example, the HotSpot C2 JIT compiler) comes along and optimizes the entire function away. Now, you have an empty thread doing nothing. Or, imagine, again, javac decides to schedule a function on a different thread, and the platform optimizer could optimize it away completely, except it cannot perform inlining across thread boundaries, and so a function that could be optimized away completely is now needlessly executed.

So, doing something like this only really makes sense if you have a single compiler making most of the optimizations in one go, so that the compiler knows about and can exploit all the different optimizations at different levels and their interactions with each other.

Note that, for example, the HotSpot C2 JIT compiler actually does perform some auto-vectorization, which is also a form of auto-parallelization.