Performance Gap – What is the Ninja Performance Gap and How to Overcome It

concurrencymultithreadingperformance

According to this abstract:

The "Ninja gap" […] is the
performance gap between naively written C/C++ code that is parallelism
unaware (often serial) and best-optimized code on modern
multi-/many-core processors.

It appears that they are saying that a programmer who can skilfully handle concurrency can write much more efficient code than a programmer who cannot. As I am beginning to write multithreaded programs, I am starting to gain an intuition for this myself.

After a reread of the abstract, they appear to argue that applying well known
algorithms can close the gap. I find the question intriguing.

So, in the interest of having this question asked here on Programmers:

  • What is the "ninja performance gap?" (and were these words chosen well? I think perhaps not.)
  • Why is it so large? (Or do we question the size of it? )
  • How can it be overcome? (How do we learn to overcome it? Should we learn skillful concurrency or learn algorithms?)

Best Answer

What is the "ninja performance gap?"

A naive approach finishing many tasks might be to start one task, wait for it to finish, and only then to start the next task. Suppose one wanted to retrieve many networked resources. The naive approach would be to send a request for the first resource, wait for the request to complete, then send the next.

If a good programmer is skilled at concurrency, such a programmer could write a program that executes many actions at a single time using all the resources of the computer on which it runs (processors, system memory, etc.). Using the network resource example above, the optimal approach would be to attempt to get many resources at the same time.

The gap between the naive approach and the most optimal approach is what these authors (and apparently others) are calling the "ninja performance gap."

Were these words chosen well?

They were likely chosen for promotional and novelty value. As the phenomenon appears to be a very real problem, I would have preferred a term that focused on the problem as opposed to the actors on one side of the gap.

Perhaps "optimized concurrency gap" would have been a better choice in words.

Why is it so large?

The naive approach to solving many problems with computers can be quite inefficient relative to known improvements.

My own experience with Project Euler demonstrated this to me, with some solutions not completing in any reasonable amount of time (minutes, hours), but other solutions to the same problem finishing blazingly fast (within a few seconds).

Bubble sort is known to be quite inefficient relative to other sorting algorithms.

Do we question the size of it?

Based on the above knowledge, I do not question the size as stated.

How can it be overcome?

The naive solution would be for the programmer to study threading and all of the relevant skills to close this gap. Closing the gap in this manner would take a lot of time and effort, both to gain the skills and to write the programs.

How do we learn to overcome it? Should we learn skillful concurrency or learn algorithms?

The paper's abstract states that using better algorithms combined with better compilers can relatively minimize the gap with much less effort required. I am inclined, without direct knowledge, to believe them.