we have an virtualization environment with actually 4 VMs (2 x linux, 1 x w2k3, 1 x win7).
In the host system (Debian Jessie) top always shows a CPU load of 30-70% (or more) for the qemu process of the win7 guest even though taskmanager inside the guest is at zero cpu load.
top - 11:12:08 up 6 days, 1:47, 1 user, load average: 0,70, 0,62, 0,55
Tasks: 216 total, 2 running, 214 sleeping, 0 stopped, 0 zombie
%Cpu(s): 5,0 us, 3,7 sy, 0,0 ni, 91,3 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st
KiB Mem: 24776900 total, 21591188 used, 3185712 free, 122680 buffers
KiB Swap: 3905532 total, 60748 used, 3844784 free. 399364 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11138 libvirt+ 20 0 10,804g 8,243g 18536 R 70,1 34,9 2137:30 qemu-system-x86
12134 libvirt+ 20 0 7309216 6,046g 18792 S 3,7 25,6 139:13.88 qemu-system-x86
12055 libvirt+ 20 0 8900940 4,057g 18500 S 2,3 17,2 109:41.87 qemu-system-x86
12041 libvirt+ 20 0 2956240 1,388g 18292 S 2,0 5,9 61:38.55 qemu-system-x86
5569 root 20 0 1007924 23456 11012 S 1,0 0,1 1:16.86 libvirtd
Inside the guest there is an MSSQL 2008 R2 Express running. Traceflag -T8038 is set therefor (according to proxmox performance tweaks). Also tablet device is removed from configuration and ballooning device is disabled inside guest (as i don't know how to disable it in VM-configuration).
Furthermore it also runs an Pervasive SQL 8 server to fire an old btrieve database.
Strange thing is that the CPU load in top drops to an adequate level (1-3%) if i completely remove all NICs from the guest. Actually as an NIC i passed through one of the physical NICs (an Intel I350). But behaviour is the same for virtualized NICs.
All this tested without any clients connected.
Actual guest configuration:
<domain type='kvm'>
<name>win7</name>
<uuid>4b62c825-07ce-49b9-be8c-63f1f51ec28c</uuid>
<memory unit='KiB'>8388608</memory>
<currentMemory unit='KiB'>8388608</currentMemory>
<vcpu placement='static'>2</vcpu>
<os>
<type arch='x86_64' machine='pc-i440fx-2.1'>hvm</type>
</os>
<features>
<acpi/>
<apic/>
<hyperv>
<relaxed state='on'/>
<vapic state='on'/>
<spinlocks state='on' retries='8191'/>
</hyperv>
</features>
<cpu mode='host-model'>
<model fallback='allow'/>
<topology sockets='1' cores='2' threads='1'/>
</cpu>
<clock offset='localtime'>
<timer name='rtc' tickpolicy='catchup'/>
<timer name='pit' tickpolicy='delay'/>
<timer name='hpet' present='no'/>
<timer name='hypervclock' present='yes'/>
</clock>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>restart</on_crash>
<pm>
<suspend-to-mem enabled='no'/>
<suspend-to-disk enabled='no'/>
</pm>
<devices>
<emulator>/usr/bin/kvm</emulator>
<disk type='block' device='disk'>
<driver name='qemu' type='raw' cache='none' io='native'/>
<source dev='/dev/vg_vm/lv_win7Pro'/>
<target dev='vda' bus='virtio'/>
<boot order='1'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</disk>
<disk type='file' device='cdrom'>
<driver name='qemu' type='raw'/>
<target dev='hdb' bus='ide'/>
<readonly/>
<address type='drive' controller='0' bus='0' target='0' unit='1'/>
</disk>
<controller type='usb' index='0' model='ich9-ehci1'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x7'/>
</controller>
<controller type='usb' index='0' model='ich9-uhci1'>
<master startport='0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0' multifunction='on'/>
</controller>
<controller type='usb' index='0' model='ich9-uhci2'>
<master startport='2'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x1'/>
</controller>
<controller type='usb' index='0' model='ich9-uhci3'>
<master startport='4'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x2'/>
</controller>
<controller type='pci' index='0' model='pci-root'/>
<controller type='ide' index='0'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
</controller>
<controller type='virtio-serial' index='0'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</controller>
<serial type='pty'>
<target port='0'/>
</serial>
<console type='pty'>
<target type='serial' port='0'/>
</console>
<channel type='spicevmc'>
<target type='virtio' name='com.redhat.spice.0'/>
<address type='virtio-serial' controller='0' bus='0' port='1'/>
</channel>
<input type='mouse' bus='ps2'/>
<input type='keyboard' bus='ps2'/>
<graphics type='vnc' port='-1' autoport='yes'/>
<video>
<model type='qxl' ram='65536' vram='65536' heads='1'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
</video>
<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='0x0000' bus='0x07' slot='0x00' function='0x1'/>
</source>
<address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
</hostdev>
<memballoon model='virtio'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</memballoon>
</devices>
</domain>
Any tips what could cause this and how to improve?
Best Answer
I had similar problem in the past, with IRQ storming in the guest and high load on the host. You must isolate what, in the guest, is storming the CPU. Prime candidate are both the MSSQL instance and the hal.dll library.
To debug, follow these steps:
EDIT: Ok, it seems that nor MSSQL nor HAL is the root cause of your host load. Go ahead to the second debug phase:
powertop
utility to monitor host's CPU activity. Here you should see what software routine / interrupt is serviced the most. Run in 30 seconds and report back here.