KEMP load balancers using UCARP (VRRP) – multicast MAC address not being picked up

cisco-catalystigmpmulticastpowerconnectvrrp

Alright – been battling this for at least 20 hours consecutive.. Sorry if this seems like a long rant, or a blog post, but I'm getting to the point of exhaustion.

So, here's the deal. We're using KEMP load balancers, who utilizes UCARP (a Linux clone of CARP, which is a VRRP clone) for HA heartbeat and persistent states. We want to utilize IGMP in our environment to prevent flooding across the datacenter.

We have two Dell PowerConnect 8124F switches running SW 5.1.1.7 acting as top-of-rack. These two are connected to a stacked pair of Cisco 3750-X, which is our core.

The problems began when we upgraded to the PowerConnect 5.1.x, where they apparently defaulted to leave IGMP snooping on unless you tell it otherwise. And behold – our load balancers went into split-brain, causing all sorts of warm fuzzy fun.

  • If I disable IGMP snooping on the VLAN where the load balancers do their multicast nothing happens, multicast is still dead
  • If I set up IP PIM on our core, the PowerConnect switches sees it on the same VLAN, but still no multicast traffic
  • If I enable flooding of all unregistered multicast traffic, it still does nothing at all.
  • If I disable IGMP snooping globally on the PowerConnect switches, all multicast traffic works. It works so great, that we get multicast traffic flooded to every single port that has the same VLAN tagged. Wonderful.

I noticed some strange MAC address entries on the VLAN at our core:

coresw#sh mac address-table vlan 367 | include 5e00
 367    0000.5e00.0101    DYNAMIC     Po13   seq_no:0

And I think.. Isn't that the multicast address? Why isn't this in the "sh mac address-table multicast"?

coresw#sh mac address-table multicast vlan 367
Vlan    Mac Address       Type        Ports
----    -----------       ----        -----
coresw#

And then I read this in the PowerConnect CLI guide:

Multicast traffic is traffic that is destined to a host group. Host
groups are identified by the destination MAC address, i.e. the range
01:00:5e:00:00:00-01:00:5e:7f:ff:ff:ff for IPv4 multicast traffic or
33:33:xx:xx:xx:xx for IPv6 multicast traffic.

Seems like we are missing a "01" at the beginning of the MAC address, no? The dynamic MAC entry above start with "00". At this point I'm thinking about calling KEMP, and letting them know that their product is horribly misconfigured. But then I go read the RFC for VRRP – and behold:

The virtual router MAC address associated with a virtual router is an
IEEE 802 MAC Address in the following format:

IPv4 case: 00-00-5E-00-01-{VRID} (in hex, in Internet-standard bit-
order)

Alright – so switches doesn't normally pick up the multicast mac address range for VRRP. Fine, let's configure a static host group on the Dell switches. Nope.

Invalid input: Multicast MAC Address must be of format 01XX:XXXX:XXXX

OK then.. Next step, try to add a static mac entry:

osl-sys-swrack03(config)#mac address-table multicast ?

forbidden                forbid adding specific multicast addresses to
                         specific ports.

osl-sys-swrack03(config)#

So – no way to configure a static multicast MAC entry. If I try to do the same with a regular static MAC entry, I can only bind it to one port – this load balancing cluster runs across 4 different 10gig ports.

Update: There seems to be some confusion regarding MAC addresses.
172.30.1.0/24 is the front-facing loadbalancer network. 172.30.1.6 is the default shared VIP for the cluster, .7 is the management IP for the first load balancer and .8 is for the second load balancer.
All other addresses (30, 40, 70, 80 etc) are all VIP with different services on them.
When a failover occurs, all the VIP's change their MAC address to the second LB's physical MAC address.
The multicast address in the bottom table does not change.

coresw#sh ip arp vlan 367
Protocol  Address          Age (min)  Hardware Addr   Type   Interface
Internet  172.30.1.6             78   0050.56b4.5004  ARPA   Vlan367    <- VIP - Loadbalancer1 physical MAC
Internet  172.30.1.40           204   0050.56b4.5004  ARPA   Vlan367    <- VIP - Loadbalancer1 physical MAC
Internet  172.30.1.80           167   0050.56b4.5004  ARPA   Vlan367    <- VIP - Loadbalancer1 physical MAC
Internet  172.30.1.70            38   0050.56b4.5004  ARPA   Vlan367    <- VIP - Loadbalancer1 physical MAC
Internet  172.30.1.66            12   0050.56b4.5004  ARPA   Vlan367    <- VIP - Loadbalancer1 physical MAC
Internet  172.30.1.35           185   0050.56b4.5004  ARPA   Vlan367    <- VIP - Loadbalancer1 physical MAC
Internet  172.30.1.60            97   0050.56b4.5004  ARPA   Vlan367    <- VIP - Loadbalancer1 physical MAC
Internet  172.30.1.30            80   0050.56b4.5004  ARPA   Vlan367    <- VIP - Loadbalancer1 physical MAC
Internet  172.30.1.61            33   0050.56b4.5004  ARPA   Vlan367    <- VIP - Loadbalancer1 physical MAC
Internet  172.30.1.7             27   0050.56b4.5004  ARPA   Vlan367    <- Management - Loadbalancer1 physical MAC
Internet  172.30.1.8             21   0050.56b4.08c2  ARPA   Vlan367    <- Management - Loadbalancer2 physical MAC

osl-sys-coresw#sh mac address-table dynamic vlan 367
          Mac Address Table
-------------------------------------------

Vlan    Mac Address       Type        Ports
----    -----------       --------    -----
 367    0000.5e00.0101    DYNAMIC     Po13   seq_no:0   <- multicast HA mac (UCARP)
 367    0050.56b4.08c2    DYNAMIC     Po13   seq_no:0   <- Loadbalancer1 physical MAC
 367    0050.56b4.5004    DYNAMIC     Po13   seq_no:0   <- Loadbalancer2 physical MAC

And that's the story. What on earth am I going to do with this?

Best Answer

I was able to resolve the issue. On the Kemp (with HA pair) you have the option of using a "Virtual MAC Address". If this box isn't checked, then the MAC of a load balancer VIP is that of the physical interface of the active Kemp unit. If this box is checked, then the MAC address of the VIP is a VRRP MAC. As you mentioned above the VRRP RFC states that the MAC being "00:00"{blah} with the last octet being the Router ID. The default Kemp HA [router] ID is 01. On my Powerconnects using Firmware 5.1.x.x I'm not using VRRP but I ran some traces and determined that the Powerconnect will drop a VRRP frame if the router ID is the same as itself. They do this EVEN if VRRP isn't configured and in that mode they default to 01. So changing the Kemp HA router ID to something like 22 (0x16) resulted in everything working.