Ovirt – Confirm Reboot for Unresponsive Host in Ovirt

ovirt

Problem:

After having a low memory condition, node started to appear as non-responsive, however most of the virtual machines were actually working and even accessible from ovirt-engine, but some VMs were shown as UP, but were actually down.

I decided to fence a failing node and to restart it. I pressed "restart" in UI, and then proceeded to reboot a node. After it came up, it was still in Non-responsive state, and virtual machines, that were running on this host, were in "Unknown" state.

When i clicked on "Confirm Host has been rebooted" i got a following error: "Another power management action is already in progress.", putting it to maintenance not worked as well because of "non responsive status" and "this node has running vm's" messages.

How can i manually fence host and get my virtual machines to run on other working hosts?

Environment:

  • oVirt Node 4.3.5.2
  • Ovirt-engine: 4.3.5.5-1.el7

Best Answer

Restarting management engine solved that problem and i was actually able to use "Confirm Host has been rebooted".

The steps required to restart hosted-engine on a same node, as described in https://www.ovirt.org/documentation/self-hosted/chap-Troubleshooting.html , while connected to the node, that has engine running:

  1. Set global maintenance mode with

    hosted-engine --set-maintenance --mode=global
    
  2. Turn off hosted engine vm:

    hosted-engine --vm-shutdown
    
  3. Start VM again when it goes up:

    hosted-engine --vm-start
    

After these steps, the "Confirm Host has been rebooted" in engine UI starts working