KVM: Which CPU features make VMs run better

central-processing-unitkvm-virtualizationqemuvirtualization

We are using Ubuntu 12.04 with the following parameters:

  • Dell R910
  • Kernel 3.2.0-25-generic #40-Ubuntu SMP x86_64 x86_64 x86_64 GNU/Linux
  • kvm 1:84+dfsg-0ubuntu16+1.0+noroms+0ubuntu13
  • qemu-kvm 1.0+noroms-0ubuntu13
  • qemu-common 1.0+noroms-0ubuntu13
  • qemu-kvm 1.0+noroms-0ubuntu13
  • 4 x Intel(R) Xeon(R) CPU E7- 4870 @ 2.40GHz (each with 10 physical
    cores, HT and Intel VT enabled)
  • The Windows guests currently have no VirtIO, but that will change
    soon

We are running several Windows guests on this machine, one of them is Windows 2003 32 Bit, another one Windows 2008 (64 Bit). We are currently struggling with performance issues and played around with the CPU models.

We usually use "qemu-system-x86_64 for our Windows 32 Bit guest, e.g.:

/usr/bin/qemu-system-x86_64 -S -M pc-1.0 -cpu qemu32 -enable-kvm -m 4096 -smp 4,sockets=4,cores=1,threads=1 [...] 

The performance of this guest turned out to be a bit low. We didn't run any benchmark yet, but let's say that copying large amount of data (files) inside the VM from one directory to the other goes much faster when we switch the CPU model from "-cpu qemu32" to "-cpu Nehalem". Files which took around 2:40h to copy now copy within 40 minutes.
Of course this is not a high quality test and there is much room for having a more professional attempt. But this is a clear indicator that choosing the correct CPU model could affect the guest's performance heavily.

Now I got curious and ran:

qemu-x86_64 -cpu ?
x86           [n270]
x86         [athlon]
x86       [pentium3]
x86       [pentium2]
x86        [pentium]
x86            [486]
x86        [coreduo]
x86          [kvm32]
x86         [qemu32]
x86          [kvm64]
x86       [core2duo]
x86         [phenom]
x86         [qemu64]

And:

kvm -cpu ?model
 x86       Opteron_G3  AMD Opteron 23xx (Gen 3 Class Opteron)
 x86       Opteron_G2  AMD Opteron 22xx (Gen 2 Class Opteron)
 x86       Opteron_G1  AMD Opteron 240 (Gen 1 Class Opteron)
 x86          Nehalem  Intel Core i7 9xx (Nehalem Class Core i7)
 x86           Penryn  Intel Core 2 Duo P9xxx (Penryn Class Core 2)
 x86           Conroe  Intel Celeron_4x0 (Conroe/Merom Class Core 2)
 x86           [n270]  Intel(R) Atom(TM) CPU N270   @ 1.60GHz
 x86         [athlon]  QEMU Virtual CPU version 1.0
 x86       [pentium3]
 x86       [pentium2]
 x86        [pentium]
 x86            [486]
 x86        [coreduo]  Genuine Intel(R) CPU           T2600  @ 2.16GHz
 x86          [kvm32]  Common 32-bit KVM processor
 x86         [qemu32]  QEMU Virtual CPU version 1.0
 x86          [kvm64]  Common KVM processor
 x86       [core2duo]  Intel(R) Core(TM)2 Duo CPU     T7700  @ 2.40GHz
 x86         [phenom]  AMD Phenom(tm) 9550 Quad-Core Processor
 x86         [qemu64]  QEMU Virtual CPU version 1.0

With all these different versions, it's a bit hard to guess at. "Nehalem" seems to be the most performant one on that list. Now I wonder, how to tell which CPU model is the best for my guest? Browsing the Internet, I found the following ressources:

When I read those sites correctly, they are claiming that "-cpu host" might bring the best performance. I don't have any worries about migration yet, since both KVM hosts are equipped equally (exactely the same hardware).

So, what do experienced KVM admins recommend? Is there a golden rule or even a matrix, like "this model is the best for that guest OS"?

My apologies if I could find out this information by my own – I ran various Google searches and browsed many websites. I was not able to find something which answers my question.

Best Answer

It's quite simple really. For homogeneous clusters and single host setups use the host option. For mixed clusters, use the lowest available CPU version, so if one host is Penryn and the other Nehalem, use Penryn on both.

If you are using RHEV or oVirt, this is already built in. VMWare have this called "EVC" and position it as a huge feature.

Getting back to performance, you definitely need virtio everywhere you can put it. And if you still hit performance bottlenecks, those can usually be addressed on a case per case basis, depending on where they occur.

[offtop]On your choice of distribution I have already commented in another thread[/offtop]