Why Build Times Vary on Different Hardware – A Performance Analysis

benchmarkingcompile-timecpuperformance

  • Why do the compile times not vary significantly between different era CPUs, even though disk (NVMe vs. HDD) and CPU benchmarks vary significantly in performance?
  • Why does disabling hyperthreading affect performance significantly with the Ryzen CPU?

Over the past few months I have seen some different machines all running Linux with CPUs that varied from about 9 years since release and some recently released. The newer CPUs received much higher benchmark numbers from cpubenchmark.net. Details below including compile times for the Linux 4.4.176 kernel using Ubuntu 18.04.

To put it simply, CPUs that scored multiple times faster at cpubenchmark.net most certainly did not decrease compile times by the same factor. In fact, sometimes meager improvements were seen.

What would be the bottleneck to change or fix? The Ryzen machine has all the latest hardware gadgets. Or is this a case of synthetic benchmarks vs. reality?

This question touches on the topic of benchmarking by compile time comparison, but does not explore the (lack of) variability in build times seen.

  • Recent Ryzen 2950X system
    • Single Thread Rating: 2208
    • Disk: NVMe
    • Source: Linux kernel, 4.4.176, compiled with .config from Ubuntu Xenial
    • Invocation: make -j32
    • Compile time: 25997u 4910s 17:25 wall time
  • Same system, with hyperthreading (or whatever AMD calls it) turned off
    • Source: Linux kernel 4.4.176, same .config file
    • Invocation: make -j16
    • Compile time: 10561u 1796s 13:44 wall time

That is an over 20% reduction in compile time amounting to a difference of 3 minutes 41 seconds, from 17:25 to 13:44. Just by disabling the hyperthreading feature.

New Data, 2019-03-08

Compile time actually decreased monotonically, but not linearly, until make -j12, taking 12:46 with a dozen processes. The compile time then increased slowly and monotonically out to the last run of make -j24. Hyperthreading was off for these tests.

Whatever the bottleneck is, is hit after 12 parallel threads.

New Data – 2019-03-27

After tuning the X.M.P. memory profile, but still with single channel access, the times decreased until a minimum was reached of 10:08.5 at 16 threads; as many threads as cores (no HT enabled). With one exception the compile times increased slowly to 10:41 at 32 threads.

Once dual channel memory was enabled, times decreased to 6:47.5 at 17 threads. After this point, the timings wobbled up and down with another minimum of 6:46.6 at 20 threads. This is probably a fluke and likely not a definitive point at which a minimum point is reached. Only a maximum of 24 threads was tested in this case.

It seems that memory is a huge factor and the bottleneck for this processor. Tossing more processing threads at the problem once memory is configured optimally does not seem to help or hinder much.

Conclusion

  • Memory is a major potential bottleneck for this CPU and must be configured properly to take advantage of the processor.

  • The old rule of as many threads as processors seems to hold.

  • Hyperthreading was not tested beyond the first tests, so no conclusions can be drawn in that area, other than that the default configuration with single channel memory probably starves the hyperthreaded cores for memory accesses.

Best Answer

Just off the top of my head:

  1. CPU performance is not the only thing that affects compile times.

  2. Benchmarks do not always take into account performance factors that affect real programs.

  3. Cache affinity can be a larger factor than CPU speed.

  4. Compilers don't always exploit concurrency mechanisms effectively.

  5. In general, programs like compilers can be written in such a way that they exploit (or fail to exploit) specific CPU characteristics.

  6. Concurrency mechanisms have overhead. If the software is not getting any benefit from the concurrency offered, the result is a net negative.

In short, it's complicated. A highly complex task like compilation is not always well-modeled by benchmarks. You have to consider other real-world factors that affect overall performance, like seek times on hard disks and the manner in which the code is written.

For a simple, but rather dramatic example of this, see here.