Windows Cluster Fails after Power Outage

clusterfailoverclusterhyper-vwindows-cluster

First, we have a Windows 2008 R2 Two Node cluster running HA Hyper-V and DHCP. We utilize a back-end Dell MD3000i iSCSI SAN for storage. All of the networking is done via redundant switches and MPIO drivers. The data network is on a different VLAN than the primary network.

Here is the scenario we keep encountering:

We have power outages sometimes. We have dual UPS devices in the cabinet and they last for about 15 minutes or so, but if we don't get power back everything goes down, cluster nodes, SAN and all.

Eventually the power comes back up, all of the devices are configured to boot when AC returns. However, when we have a complete outage like this the cluster never comes back online properly. We get the usual errors like the Quorum disk is unavailable, etc. In addition our two primary domain controllers are virtual machines on top of the VM Cluster. We do have a physical server running as another domain controller thinking this would help when things come back online.

What we are not understanding is why the system is not able to recover itself when it boots, there is an available DC for authentication, eventually. The iSCSI network comes back online, is there something else we are missing?

I think it may be related to the iSCSI Initiator service not starting quickly enough when the cluster service is ready to go.

Any ideas or things I can post to help?

Thanks,
Brent

Best Answer

We had the same problem with our cluster not coming back up cleanly after a power failure. Like you, the shared storage is located on iSCSI SANs. The fix for us was to ensure that VM host and guest startup was delayed long enough to ensure the SANs were back online FIRST. We found that if we didn't do this, the shared volumes would reconnect, but remain in an offline state, thus causing the cluster to fail....

Related Topic