How much overhead does x86/x64 virtualization have

64-bitperformancevirtualizationx86

How much overhead does x86/x64 virtualization (I'll probably be using VirtualBox, possbly VMWare, definitely not paravirtualization) have for each of the following operations a Win64 host and Linux64 guest using Intel hardware virtualization?

  • Purely CPU-bound, user mode 64-bit code

  • Purely CPU-bound, user mode 32-bit code

  • File I/O to the hard drive (I care mostly about throughput, not latency)

  • Network I/O

  • Thread synchronization primitives (mutexes, semaphores, condition variables)

  • Thread context switches

  • Atomic operations (using the lock prefix, things like compare-and-swap)

I'm primarily interested in the hardware assisted x64 case (both Intel and AMD) but wouldn't mind hearing about the unassisted binary translation and x86 (i.e. 32-bit host and guest) cases, too. I'm not interested in paravirtualization.

Best Answer

I found that there isn't simple and absolute answer for questions like yours. Each virtualization solution behaves differently on specific performance tests. Also, tests like disk I/O throughput can be split in many different tests (read, write, rewrite, ...) and the results will vary from solution to solution, and from scenario to scenario. This is why it is not trivial to point one solution as being the fastest for disk I/O, and this is why there is no absolute answer for labels like overhead for disk I/O.

It gets more complex when trying to find relation between different benchmark tests. None of the solutions I've tested had good performance on micro-operations tests. For example: Inside VM one single call to "gettimeofday()" took, in average, 11.5 times more clock cycles to complete than on hardware. The hypervisors are optimized for real world applications and do not perform well on micro-operations. This may not be a problem for your application that may fit better as real world application. I mean by micro-operation any application that spends less than 1,000 clock cycles to finish(For a 2.6 GHz CPU, 1,000 clock cycles are spent in 385 nanoseconds, or 3.85e-7 seconds).

I did extensive benchmark testing on the four main solutions for data center consolidation for x86 archictecture. I did almost 3000 tests comparing performance inside VMs with the hardware performance. I've called 'overhead' the difference of maximum performance measured inside VM(s) with maximum performance measured on hardware.

The solutions:

  • VMWare ESXi 5
  • Microsoft Hyper-V Windows 2008 R2 SP1
  • Citrix XenServer 6
  • Red Hat Enterprise Virtualization 2.2

The guest OSs:

  • Microsoft Windows 2008 R2 64 bits
  • Red Hat Enterprise Linux 6.1 64 bits

Test Info:

  • Servers: 2X Sun Fire X4150 each with 8GB of RAM, 2X Intel Xeon E5440 CPU, and four gigabit Ethernet ports
  • Disks: 6X 136GB SAS disks over iSCSI over gigabit ethernet

Benchmark Software:

  • CPU and Memory: Linpack benchmark for both 32 and 64 bits. This is CPU and memory intensive.

  • Disk I/O and Latency: Bonnie++

  • Network I/O: Netperf: TCP_STREAM, TCP_RR, TCP_CRR, UDP_RR and UDP_STREAM

  • Micro-operations: rdtscbench: System calls, inter process pipe communication

The averages are calculated with the parameters:

  • CPU and Memory: AVERAGE(HPL32, HPL64)

  • Disk I/O: AVERAGE(put_block, rewrite, get_block)

  • Network I/O: AVERAGE(tcp_crr, tcp_rr, tcp_stream, udp_rr, udp_stream)

  • Micro-operations AVERAGE(getpid(), sysconf(), gettimeofday(), malloc[1M], malloc[1G], 2pipes[], simplemath[])

For my test scenario, using my metrics, the averages of the results of the four virtualization solutions are:

VM layer overhead, Linux guest:

  • CPU and Memory: 14.36%

  • Network I/O: 24.46%

  • Disk I/O: 8.84%

  • Disk latency for reading: 2.41 times slower

  • Micro-operations execution time: 10.84 times slower

VM layer overhead, Windows guest:

  • CPU and Memory average for both 32 and 64 bits: 13.06%

  • Network I/O: 35.27%

  • Disk I/O: 15.20%

Please note that those values are generic, and do not reflect the specific cases scenario.

Please take a look at the full article: http://petersenna.com/en/projects/81-performance-overhead-and-comparative-performance-of-4-virtualization-solutions