Recover an oVirt Hosted Engine Crash

ovirtvirtualization

I have an oVirt setup, and recently yum updated all packages in hosts and hosted engine.

Problem is I can't start the hosted engine. After a while if you issue the command:

hosted-engine --vm-status

You get:

--== Host 1 status ==--

Status up-to-date                  : True
Hostname                           : hyper1.sarmiento.dmsn
Host ID                            : 1
Engine status                      : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "down"}
Score                              : 0
Local maintenance                  : False
Host timestamp                     : 4129
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=4129 (Tue May  5 13:15:28 2015)
    host-id=1
    score=0
    maintenance=False
    state=EngineUnexpectedlyDown
    timeout=Wed Dec 31 22:14:34 1969


--== Host 2 status ==--

Status up-to-date                  : True
Hostname                           : hyper2.sarmiento.dmsn
Host ID                            : 2
Engine status                      : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "down"}
Score                              : 0
Local maintenance                  : False
Host timestamp                     : 3900
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=3900 (Tue May  5 13:15:19 2015)
    host-id=2
    score=0
    maintenance=False
    state=EngineUnexpectedlyDown
    timeout=Wed Dec 31 22:11:48 1969

I've been serching a lot in the logs, and what seems to be the problem is that in

/var/log/libvirt/qemu/HostedEngine.log

I can see

2015-05-05T15:18:29.928875Z qemu-kvm: -drive file=/var/run/vdsm/storage/fa0ae001-ccaf-46ed-940a-a3bb1f147f18/c1cd16d1-068d-467b-88fd-6a4910099d27/51e3c614-7725-429d-b1b6-99dbe4eb3b7c,if=none,id=drive-virtio-disk0,format=raw,serial=c1cd16d1-068d-467b-88fd-6a4910099d27,cache=none,werror=stop,rerror=stop,aio=threads: could not open disk image /var/run/vdsm/storage/fa0ae001-ccaf-46ed-940a-a3bb1f147f18/c1cd16d1-068d-467b-88fd-6a4910099d27/51e3c614-7725-429d-b1b6-99dbe4eb3b7c: Could not refresh total sector count: Operation not permitted
2015-05-05 15:18:30.183+0000: shutting down

It says that it cannot open the image file. But it does not says why!!! Any idea on how to debug this and have the engine up and running again???

Thanks a lot!!

Edit: oVirt Version is 3.5

Best Answer

Ok, so it was a storage issue. Hosted engine images was hosted by a gluster volume and the image was in a split-brain situation.

Thanks dyasny for your help!!!