The correct way to remediate an HA Cluster

high-availabilityvirtual-machinesvirtualizationvmware-esxivmware-vcenter

Background / Goal

  • I have a VMWare HA cluster for production Machines with two hosts.
  • It is currently set up so that it can account for the failure of up to one host. It does not use DRS.
  • I need to remediate both of these servers to apply patches. I would like to do this with zero downtime.

Questions

  • Can I vMotion the VMs in the cluster specifically to another host in the cluster and then take down a server?
  • What is the best / recommended way to remediate servers in a HA configuration to avoid downtime?

Best Answer

If you're not using DRS then you'll have to manually evacuate your powered on VM's to another host in the cluster before VUM will remediate the host. It's also recommended that if you're using HA Admission Control, Distributed Power Management or Fault Tolerance that you disable those features before you remediate the host.

In short, migrate (vMotion) your powered on VM's to another host in the cluster, remediate the host, then migrate the VM's back.