I found that there isn't simple and absolute answer for questions like yours. Each virtualization solution behaves differently on specific performance tests. Also, tests like disk I/O throughput can be split in many different tests (read, write, rewrite, ...) and the results will vary from solution to solution, and from scenario to scenario. This is why it is not trivial to point one solution as being the fastest for disk I/O, and this is why there is no absolute answer for labels like overhead for disk I/O.
It gets more complex when trying to find relation between different benchmark tests. None of the solutions I've tested had good performance on micro-operations tests. For example: Inside VM one single call to "gettimeofday()" took, in average, 11.5 times more clock cycles to complete than on hardware. The hypervisors are optimized for real world applications and do not perform well on micro-operations. This may not be a problem for your application that may fit better as real world application. I mean by micro-operation any application that spends less than 1,000 clock cycles to finish(For a 2.6 GHz CPU, 1,000 clock cycles are spent in 385 nanoseconds, or 3.85e-7 seconds).
I did extensive benchmark testing on the four main solutions for data center consolidation for x86 archictecture. I did almost 3000 tests comparing performance inside VMs with the hardware performance. I've called 'overhead' the difference of maximum performance measured inside VM(s) with maximum performance measured on hardware.
The solutions:
- VMWare ESXi 5
- Microsoft Hyper-V Windows 2008 R2 SP1
- Citrix XenServer 6
- Red Hat Enterprise Virtualization 2.2
The guest OSs:
- Microsoft Windows 2008 R2 64 bits
- Red Hat Enterprise Linux 6.1 64 bits
Test Info:
- Servers: 2X Sun Fire X4150 each with 8GB of RAM, 2X Intel Xeon E5440 CPU, and four gigabit Ethernet ports
- Disks: 6X 136GB SAS disks over iSCSI over gigabit ethernet
Benchmark Software:
CPU and Memory: Linpack benchmark for both 32 and 64 bits. This is CPU and memory intensive.
Disk I/O and Latency: Bonnie++
Network I/O: Netperf: TCP_STREAM, TCP_RR, TCP_CRR, UDP_RR and UDP_STREAM
Micro-operations: rdtscbench: System calls, inter process pipe communication
The averages are calculated with the parameters:
CPU and Memory: AVERAGE(HPL32, HPL64)
Disk I/O: AVERAGE(put_block, rewrite, get_block)
Network I/O: AVERAGE(tcp_crr, tcp_rr, tcp_stream, udp_rr, udp_stream)
Micro-operations AVERAGE(getpid(), sysconf(), gettimeofday(), malloc[1M], malloc[1G], 2pipes[], simplemath[])
For my test scenario, using my metrics, the averages of the results of the four virtualization solutions are:
VM layer overhead, Linux guest:
VM layer overhead, Windows guest:
Please note that those values are generic, and do not reflect the specific cases scenario.
Please take a look at the full article: http://petersenna.com/en/projects/81-performance-overhead-and-comparative-performance-of-4-virtualization-solutions
It's quite simple really. For homogeneous clusters and single host setups use the host
option. For mixed clusters, use the lowest available CPU version, so if one host is Penryn and the other Nehalem, use Penryn on both.
If you are using RHEV or oVirt, this is already built in. VMWare have this called "EVC" and position it as a huge feature.
Getting back to performance, you definitely need virtio everywhere you can put it. And if you still hit performance bottlenecks, those can usually be addressed on a case per case basis, depending on where they occur.
[offtop]On your choice of distribution I have already commented in another thread[/offtop]
Best Answer
On a Windows host, qemu isn't actually a hypervisor, but is doing full machine emulation with dynamic translation, which is horrendously slow, and there's little that can be done to speed it up.
It's maybe useful as a demonstration or for debugging purposes, but for anything serious you will want to use an actual Windows hypervisor such as Hyper-V, or some other actual hypervisor entirely (e.g. KVM on Linux).