Virtual IP failover issue with vmware exsi ip based hashing

arpfailovericmplinuxvmware

We are deploying HA database cluster on exsi which based on virtual IP concept for failover. if One node fail reserved VIP will be assigned to another node

we have two exsi nodes and two virtual machines on those exsi nodes as master and slave.

db_master = 192.168.60.10
db_slsave = 192.168.60.11
reserved_vip = 192.168.60.12

gateway = 192.168.60.1

each exsi node have two interfaces. NIC teaming configured with IP hash-based routing with the of LAG configured on the switch side ( NO LACP).

LAG configured as a trunk port and I terminate VLANs inside the exsi port groups

All of my database servers are on vlan 60 and those are the only VM on the exsi at the moment

I configured virtual port group and assign vlan 60. and network up and running. I can ping other nodes outside my setup and LAG working fine when I remove one cable.

the issue comes with database failover. For normal scenario VIP 192.168.60.12 assigned to db_master. when I switched off db_master VIP will be assigned to db_slave.

The failover software feature is working perfectly fine but when virtual ip (192.168.60.12) assigned to db_slave I cannot ping it anymore from the gateway.

it seems this is arp issue. but I would like to confirm that exsi IP hash-based routing and lag not playing wrong here by effecting the failover.

moreover, I did not check this with application level but only with ICMP

Best Answer

The ESXi's vSwitch doesn't care about IP addresses, just MACs. The "IP hash-based routing" is about L2 load balancing - which egress port is used - and has no relevance for IP routing.

You need to make sure that a gratuitous ARP (GARP) is sent by the failover server and that this GARP is properly processed by all relevant nodes in the segment - they all need to update their ARP cache tables. Alternatively, the obsolete entry can be deleted (so it gets reARPed). If neither can be done then that failover concept won't work.

Without proper ARP update (IP-to-MAC mapping) the IP packets are sent to the failed MAC node - the switches forward the frame to the failed device (or they might be flooded to all switch ports but with the wrong MAC that makes no difference).

Also, you shouldn't use IP hashing and static LAG on the physical switch opposite the ESXi unless your workload is suited for it. While this does work, the potentially unwanted effect of the LAG group is that the switch decides which port is used to send to the host for a certain vNIC (MAC). With simple, independent ports instead of a LAG trunk and routing by virtual port ID or MAC hash, the switch uses the ports that the destination vNIC/MAC is logically associated with (because it was used for egress and learned) - so the host decides which port is used from switch to host. Generally, this is prefered.

In other words, without static LAG traffic from or to a certain vNIC always uses the same physical port in both directions (controlled by the host). With static LAG the traffic to a certain vNIC might use a different port (controlled by the switch) than the traffic from that vNIC (controlled by the host).

Since a vSwitch never forwards a frame back through a physical port you don't need a LAG trunk or STP to prevent a bridge loop. The ESXi behaves more predictably when you don't use LAG. You can also set up a port group for each physical NIC and fail over to another port group's NIC - that gives you total control over vNIC-to-physical-NIC mapping.