Oracle VM 2.2 nodes rebooting for no obvious reason

ocfs2oraclevmxen

I have a simple four node Oracle VM environment. A management server running in vmware, a nfs server for shared storage and two Oracle VM servers running the actual hypervisor.

For some reason the node running the pool master service will suddenly reboot for no obvious reason. I'm fairly sure it's a software issue, possibly a cluster watchdog of some sort. Just to be clear, it's the vm server/hypervisor that reboots, not the guest machines.

Have anyone seen similar issues, or have any suggestions as to where I should start looking for the root cause?

I don't see anything suspicious in the /var/log/ovs*/ logs, any other place I shold look?

The documentation from Oracle leaves a little something to be desired.

Best Answer

I'm not sure if you have the nice fancy graphs that come with the VM Management or not. If you do they do provide a decent amount of insight into what the memory, cpu and disks are doing. Perhaps there might be some correlation? From there you can start looking at top and ps to see what exactly is running, and in use, when the server bounces.

Also can you set the servers into debug mode? Do they support that?

I hope this helps get you started at the very least.