How to fix a frozen qemu VM

libvirtqemu

I have a qemu hypervisor on RHEL 6.4 that hosts VMs that will quite often lose their connection to the network. When the VM has lost its connection I can view and interact with the it using virt-manager, but there doesn't seem to be anything I can do to reestablish the network connection. Pinging external hosts doesn't work, refreshing the DHCP address doesn't work, and restarting the networking service doesn't work.

At this point I can restart the VM (either using shutdown -r now or using the virt-manager ui). The VM will appear to shutdown correctly, getting to the point where is says "Halting System".

From there on the VM is completely unresponsive. I can not access it via virt-manager, virsh shows the VM state as "in shutdown", and I can not destroy it via virsh.

virsh # destroy vmname
error: Failed to destroy domain vmname error:
operation failed: failed to kill qemu process with SIGTERM

This has happened a couple of times now, all with the same symptoms of a lost network connection and a frozen VM after rebooting. Unfortunately I don't have control of the hypervisor, so I can't access the log files, and can only glean a limited amount of information from virsh.

Has anyone seen this bug? Is it caused by the configuration of the hypervisor or the VMs?

Best Answer

If you don't have control of the hypervisor, you should contact the sysadmin who does and ask them to investigate the event and for appropriate log sections related to your VM. Under no circumstances is a guest supposed to be able to stall-out a destroy command from libvirt; it's supposed to be a hard poweroff like yanking the power plug. So it's either a bug in the host's setup or a bug in libvirt, either way, the admin should know about it.