After destroy_domain, XenServer VM can’t Migrate within Pool

citrixlive-migrationxenserver

I have a three XenServer 6.1 servers in a pool. HA is normally used, but is currently turned off for this operation.

Recently, I had to force a shutdown of a VM by destroying the domain via these instructions:
http://support.citrix.com/article/CTX131421

With the exception that my command line didn't seem to have the destroy_domain command. A different article pointed me to the full path at /opt/xensource/debug/xenops destroy_domain -domid x
(http://gimpland.org/now/2013/01/citrix-xenserver-how-to-force-shutdown-virtual-machines/)

It worked, and I was able to start up the VM without issue. I have found though that attempting to migrate the VM to any other server in the pool though. Attempting to migrate the VM to another host in the pool via XenCenter produces the following error after about 30-40 seconds:

    Migrating VM 'Cleanup 7' from XenBlade5 to XenBlade 6: 
    Error: Internal error: file "xapi_xenops.ml", line 1740, characters 3-9: Assertion Failed.

The VM is now paused. Attempting to resume it (still on the original server, since it couldn't move) produces the following error in the server event log:
There were no servers available to complete the specified operation.

In addition, XenCenter pops up a dialog saying "Error starting VM", with an error for each server in the pool saying "Object has been deleted.VDI:OpaqueRef:NULL. The server will not resume.

If I force shut-down the VM, I may then re-start it. Unfortunately, the VM can still not be migrated and produces the same errors above.

I discovered this issue on one of our production VM servers, but these tests are being performed on a throw-away Windows 7 Enterprise VM. The production VM is CentOS, so I don't think I'm experiencing anything operating system specific. It looks like an issue with the destroy_domain command I issued above.

Other VMs that were not shutdown with destroy_domain may move freely to and from this server.

I'm not a XenServer veteran, so any help, correction, or request for clarification is greatly appreciated. A huge thank you in advance for any help!

Best Answer

From https://github.com/xapi-project/xen-api/blob/fe28d3e3254b1c9928dfb99d75e94e949504dcf7/ocaml/xapi/xapi_xenops.ml, which looks to be the source of v6.1 E017, has line #1739:

(* XXX: if the guest crashed or shutdown immediately then it may be offline now *)
assert (Db.VM.get_power_state ~__context ~self = (if paused then `Paused else `Running))

I don't do OCAML, nor am I a XenServer expert, but this assert appears to be in a method called start and looks like it makes sure that the VM is started, by checking the power-state from in the internal database. I would guess that, because you have forced the VM to shutdown, it hasn't updated the database properly.

Maybe the command from your second link will clean up the database:

xe vm-reset-powerstate  uuid=<UUID of VM> force=true

Otherwise, you are going to have to trawl the logs to find which operations failed before this one, maybe in /var/log/xensource.log?