Why would more CPU cores on virtual machine slow compile times

asp.netcompilerscalabilityvirtual machinevirtualization

[edit#2] If anyone from VMWare can hit me up with a copy of VMWare Fusion, I'd be more than happy to do the same as a VirtualBox vs VMWare comparison. Somehow I suspect the VMWare hypervisor will be better tuned for hyperthreading (see my answer too)

I'm seeing something curious. As I increase the number of cores on my Windows 7 x64 virtual machine, the overall compile time increases instead of decreasing. Compiling is usually very well suited for parallel processing as in the middle part (post dependency mapping) you can simply call a compiler instance on each of your .c/.cpp/.cs/whatever file to build partial objects for the linker to take over. So I would have imagined that compiling would actually scale very well with # of cores.

But what I'm seeing is:

  • 8 cores: 1.89 sec
  • 4 cores: 1.33 sec
  • 2 cores: 1.24 sec
  • 1 core: 1.15 sec

Is this simply a design artifact due to a particular vendor's hypervisor implementation (type2:virtualbox in my case) or something more pervasive across more VMs to make hypervisor implementations more simpler? With so many factors, I seem to be able to make arguments both for and against this behavior – so if someone knows more about this than me, I'd be curious to read your answer.

Thanks
Sid

[edit:addressing comments]

@MartinBeckett: Cold compiles were discarded.

@MonsterTruck: Couldn't find an opensource project to compile directly. Would be great but can't screwup my dev env right now.

@Mr Lister, @philosodad: Have 8 hw threads, using VirtualBox, so should be 1:1 mapping without emulation

@Thorbjorn: I have 6.5GB for the VM and a smallish VS2012 project – it's quite unlikely that I'm swapping in/out trashing the page file.

@All: If someone can point to an open source VS2010/VS2012 project, that might be a better community reference than my (proprietary) VS2012 project. Orchard and DNN seem to need environment tweaking to compile in VS2012. I really would like to see if someone with VMWare Fusion also sees this (for VMWare vs VirtualBox compartmentalization)

Test details:

  • Hardware: Macbook Pro Retina
    • CPU : Core i7 @ 2.3Ghz (quad core, hyper threaded = 8 cores in windows task manager)
    • Memory : 16 GB
    • Disk : 256GB SSD
  • Host OS: Mac OS X 10.8
  • VM type: VirtualBox 4.1.18 (type 2 hypervisor)
  • Guest OS: Windows 7 x64 SP1
  • Compiler: VS2012 compiling a solution with 3 C# Azure projects
    • Compile times measure by VS2012 plugin called 'VSCommands'
    • All tests run 5 times, first 2 runs discarded, last 3 averaged

Best Answer

Answer: It doesn't slow down, it does scale up with # of CPU cores. The project used in the original question was 'too small' (it's actually a ton of development but small/optimized for a compiler) to reap the benefits of multiple cores. Seems instead of planning how to spread the work, spawning multiple compiler processes etc, at this small scale it's best to hammer at the work serially right off the bat.

This is based off the new experiment I did based off the comments to the question (and my personal curiosity). I used a larger VS project - Umbraco CMS's source code since it's large, open sourced and one can directly load up the solution file and rebuild (hint: load up umbraco_675b272bb0a3\src\umbraco.sln in VS2010/VS2012).

NOW, what I see is what I expect, i.e. compiles scale up!! Well, to a certain point since I find:

Table of results

Takeaways:

  • A new VM core results in a new OS X Thread within the VirtualBox process
  • Compile times scale up as expected (compiles are long enough)
  • At 8 VM cores, core emulation might be kicking in within VirtualBox as the penalty is massive (50% hit)
  • The above is likely because OS X is unable to present 4 hyper-threaded cores (8 h/w thread) as 8 cores to VirtualBox

That last point caused me to monitor the CPU history across all the cores via 'Activity Monitor' (CPU history) and what I found was

OS X CPU history graph

Takeaways:

  • At one VM core, the activity seems to be hopping across the 4 HW cores. Makes sense, to distribute heat evenly at core levels.

  • Even at 4 Virtual cores (and 27 VirtualBox OS X threads or ~800 OS X thread overall), only even HW threads (0,2,4,6) are almost saturated while odd HW threads (1,3,5,7) are almost at 0%. More likely the scheduler works in terms of HW cores and NOT HW threads so I speculate perhaps the OSX 64bit kernel/scheduler isn't optimized for hyper threaded CPU? Or looking at the 8VM core setup, perhaps it starts using them at a high % CPU utilization? Something funny is going one ... well, that's a separate question for some Darwin developers ...

[edit]: I'd love to try the same in VMWare Fusion. Chances are it won't be this bad. I wonder if they showcase this as a commercial product ...

Footer:

In case the images ever disappear, the compile time table is (text, ugly!)

Cores in    Avg compile      Host/OSX    Host/OSX CPU
   VM         times (sec)   Threads      consumption
    1           11.83            24        105-115%
    2           10.04            25        140-190%
    4            9.59            27        180-270%
    8           14.18            31        240-430%