Best Practices for Combining HSRP and ECMP

arpbest practicesciscoecmphsrp

The combination of ECMP (or other causes of asymmetric paths) and HSRP is broken by default in Cisco IOS; the default behaviour with this design floods unicast traffic excessively.

What is the best-practice for using HSRP with ECMP to prevent unknown unicast flooding?

Details / Background

We have a HSRP topology similar to the first diagram below for many of our facilities. Our Cisco WAN routers have equal-cost routes to all other sites; thus we can see asymmetric routing effects all the time. Normally we assign R1 to be the HSRP primary, but ECMP allow return traffic through either R1 or R2.

The issue is that when PC1 mounts a remote iSCSI drive across the WAN, the traffic leaves the site via R1, but could return via R2. As long as the iSCSI traffic returns via R1, there are no issues.

HSRP_Broken_00

The problem occurs when PC1's traffic returns via R2. Assume the iSCSI session starts at 8:00:00, and both routers and both switches learn PC1's mac simultaneously. Between 8:00:00 and 8:00:05, there are no flooding problems because both switches still have PC1's mac-address in their CAM table.

HSRP_Broken_01

Five minutes after the iSCSI session starts, S2's CAM entry for PC1's mac expires out of the CAM table and S2 floods PC1's traffic out all ports (in this case to Po1, Gi0/3 and Gi0/4). If PC1's iSCSI session consumes a lot of bandwidth, this unknown unicast flooding can suck non-trivial capacity from the links to PC3 and PC4.

Cisco IOS switches have a default CAM timer of 300 seconds…

S2# show mac address-table aging-time
Vlan Aging Time
---- ----------
1    300
17   300

However, Cisco IOS' default interface ARP timer is 4 hours…

R2# show interface gi0/0
GigabitEthernet0/0 is up, line protocol is up 
  Hardware is AmdP2, address is 000a.dead.beef (bia 000a.dead.beef)
  Internet address is 172.17.1.252/24
  MTU 1500 bytes, BW 10000 Kbit, DLY 1000 usec, 
     reliability 255/255, txload 1/255, rxload 1/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  ARP type: ARPA, ARP Timeout 04:00:00       <--------------

Therefore, S2 starts flooding PC1's iSCSI traffic after five minutes.

HSRP_Broken_02

Best Answer

The simple answer is to make the CAM timer equal or slightly longer than the corresponding interface ARP timer, but there are at least three different options to select from...

Option 1: Lower all interface ARP Timers

This option works best if you have a decent-sized layer2 switched network, a reasonable number of ARP entries and few routed interfaces. This method also is preferable if you like to see PC mac entries age out of the topology quickly.

  • On all IOS ethernet interfaces facing an ethernet switch: arp timeout 240
  • On all IOS ethernet interfaces facing an ethernet switch: hold-queue 200 in and hold-queue 200 out to avoid dropping ARP packets during periodic ARP-refreshes (these limits could be higher, or lower depending on how many ARP refreshes you think that you'll need to handle at once). If you are adjusting Selective Packet Discard values, then you should follow the guidelines in the paper I linked.

This forces Cisco IOS to refresh the ARP table within four minutes, if it hasn't happened otherwise for a given ARP entry. The obvious disadvantage is that this doesn't scale well if you have lots of ARP entries... the limits vary by platform. I have used this with a few hundred ARPs per router on Catalyst 4500 / 6500 (the Layer3 SVIs) without any issues.

Option 2: Increase the switch CAM Timers

This option works best if you have a large number of ARP entries (i.e. thousands, such as an intense VMWare environment could see).

  • On all IOS switches: mac address-table aging-time 14400, or mac address-table aging-time 14400 vlan <vlan-id> for any Vlan that is of concern.

This change adjusts timers that most people assume are fixed at 300 seconds (on Cisco IOS), so be sure to include this in continuity docs. The side-effect of this is that CAM table entries linger for 4 hours after the PC is removed (which can be either good or bad, depending on your PoV). If 4 hours is too long, see the next option...

Option 3: Change both the interface ARP timers, and the switch CAM Timers

This option avoids hideously-long CAM timers in Option 2 at the expense of more configuration. You can choose whether you need 900 seconds, 1800 seconds, or whatever... just make sure your CAM and ARP timers match; thus, you will need to configure both Option 1 and Option 2 in your topologies.