How to manage a global VM startup order across the whole datacenter

datacenterstartupvmware-esxivmware-vcentervmware-vsphere

Suppose you have a fully virtualized VMware infrastructure: ESXi, vCenter, vMotion, HA, DRS, the whole package.

Inside, you have lots of VMs, which at any given time may reside on one host or another (that's the whole point of clustering, isn't it?).

You experience a power loss, and, one way or another, you manage to shut down gracefully all VMs and all hosts; let's not delve into this for now, let's just assume your UPS software can handle it. Or, at least, let's assume the shutdown was not so graceful, but everything is still able to come up again once power is restored.

Power comes back, and your hosts restart.

Your environment is quite complex, and it has natural dependencies between VMs: domain controllers should start first, an application server can't start unless its back-end DB server is already up and running, and so on.

We all know (or should hopefully know) how to configure automatic VM startup and how to specificy a VM startup order and delay on a single ESX/i host.

But how to do this across a whole datacenter?

Is there any way to tell vSphere "start these VMs in this global order, regardless of the physical host they are running on"?

Bonus points: if vCenter itself is running on a virtual machine, how does this change things?

Best Answer

There doesn't seem to be a clean way to fully manage a cold start of a virtual infrastructure once HA is configured on the individual hosts. Enabling HA and DRS seems to disable the Virtual Machine Startup and Shutdown options on the host servers. However, any ordering set before the host is moved into the cluster seems to stick. If the number of hosts is small or manageable, it's possible to set startup priority in the vSphere client by connecting to the hosts individually. Put your rules there. This actually works in the situation you describe.

enter image description here

Storage comes first!

Once the shared storage is up, I work on the hosts... I've had partial outages where vCenter virtualized as well. What I do in this case is set automatic boot and ordering options for the most critical systems; typically a domain controller and DNS/DHCP. Remember, vCenter is not likely to be available in the cold-start scenario. If I can fit it in, then I will... otherwise it gets started manually.

From there, I make sure HA and DRS rules are intact. I usually have disaffinity rules set for terminal servers, application servers and domain controllers. Once vCenter comes up, most of this gets sorted out.

I had a lightning strike a few weeks ago that took part of my server room down, including the switch blade containing the storage network. VMWare HA brought everything back once the storage switch ports were relocated and reprogrammed.

So, this type of issue falls under a real emergency or a manual effort. I wouldn't expect a hands-off startup of the system environment in the scenario you describe.

Edit:

Two weeks ago, I had a brownout that tripped a UPS. Two hosts, VC and a SAN/NAS device. Everything came back on its own and I didn't have to intervene (I was actually on a plane and got the messages upon landing).

Related Topic