VSphere 6.5 HA agent cannot be correctly installed or configured

clusterfailoverclusterhigh-availabilityvmware-esxi

Last week we encounter the following issue : we had to shutdown our entire infrastructure due to UPS replacement. At the end of electrical operations we had restarted :

  1. network
  2. SANs
  3. vCenter
  4. ESXis (2 in cluster)

After waiting for ESXi's startup, we discover that the cluster had error : Insufficient configured resources to satisfy the desired vSphere HA failover level on the cluster.

We then discover that vCenter cannot contact ESXis through the network : a switch's PDU had been unplugged during operations.

PDU re-replugged, ESXis can now communicate with vCenter, but the following alarm has come on each host : vSphere HA agent cannot be correctly installed or configured.

We decided to restart both ESXis, no luck, errors still remain.

Due to maintenance window constraint we decided to remove both hosts from the cluster to be able to start our VMs, at the cost of no automatic fail-over in case of failure of one host.

After googling a lot, reading many VMware's KBs we try (no order) :

No more result…

During our journey we discover only one error in /var/log/fdm.log on both hosts :

2018-06-25T09:05:54.232Z error fdm[47A8940] [Originator@6876 sub=Cluster] [ClusterPersistence::DoFetchDataSync] Open of file /etc/opt/vmware/fdm/kvstore failed: No such file or directory
2018-06-25T09:05:54.232Z warning fdm[47A8940] [Originator@6876 sub=Cluster] [ClusterManagerImpl::ReadPersistentObject] Couldn't open kvstore

Googling this kvstore-thing lead me to nothing, maybe I have to review my google-fu…

Best Answer

I know you mention you already tried it, but in case it helps someone else I just wanted to say that the solution for us was to disable / remove the HA configuration entirely from the entire cluster and then enable it again. I also couldn't find any info on this with Google, except your post.

We had the exact same issue. Had just finished an update to 6.5 on server #3 out of 5. First two updates went fine, no issues with HA. Third one went fine, but HA wouldn't come back on. Same error. Same message in the fdm.log file (Open of file /etc/opt/vmware/fdm/kvstore failed: No such file or directory).