Best Practices for Combining HSRP and ECMP

arpbest practicesciscoecmphsrp

The combination of ECMP (or other causes of asymmetric paths) and HSRP is broken by default in Cisco IOS; the default behaviour with this design floods unicast traffic excessively.

What is the best-practice for using HSRP with ECMP to prevent unknown unicast flooding?

Details / Background

We have a HSRP topology similar to the first diagram below for many of our facilities. Our Cisco WAN routers have equal-cost routes to all other sites; thus we can see asymmetric routing effects all the time. Normally we assign R1 to be the HSRP primary, but ECMP allow return traffic through either R1 or R2.

The issue is that when PC1 mounts a remote iSCSI drive across the WAN, the traffic leaves the site via R1, but could return via R2. As long as the iSCSI traffic returns via R1, there are no issues.

HSRP_Broken_00

The problem occurs when PC1's traffic returns via R2. Assume the iSCSI session starts at 8:00:00, and both routers and both switches learn PC1's mac simultaneously. Between 8:00:00 and 8:00:05, there are no flooding problems because both switches still have PC1's mac-address in their CAM table.

HSRP_Broken_01

Five minutes after the iSCSI session starts, S2's CAM entry for PC1's mac expires out of the CAM table and S2 floods PC1's traffic out all ports (in this case to Po1, Gi0/3 and Gi0/4). If PC1's iSCSI session consumes a lot of bandwidth, this unknown unicast flooding can suck non-trivial capacity from the links to PC3 and PC4.

Cisco IOS switches have a default CAM timer of 300 seconds…

S2# show mac address-table aging-time
Vlan Aging Time
---- ----------
1    300
17   300

However, Cisco IOS' default interface ARP timer is 4 hours…

R2# show interface gi0/0
GigabitEthernet0/0 is up, line protocol is up 
  Hardware is AmdP2, address is 000a.dead.beef (bia 000a.dead.beef)
  Internet address is 172.17.1.252/24
  MTU 1500 bytes, BW 10000 Kbit, DLY 1000 usec, 
     reliability 255/255, txload 1/255, rxload 1/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  ARP type: ARPA, ARP Timeout 04:00:00       <--------------

Therefore, S2 starts flooding PC1's iSCSI traffic after five minutes.

HSRP_Broken_02

Best Answer

The simple answer is to make the CAM timer equal or slightly longer than the corresponding interface ARP timer, but there are at least three different options to select from...

Option 1: Lower all interface ARP Timers

This option works best if you have a decent-sized layer2 switched network, a reasonable number of ARP entries and few routed interfaces. This method also is preferable if you like to see PC mac entries age out of the topology quickly.

On all IOS ethernet interfaces facing an ethernet switch: arp timeout 240
On all IOS ethernet interfaces facing an ethernet switch: hold-queue 200 in and hold-queue 200 out to avoid dropping ARP packets during periodic ARP-refreshes (these limits could be higher, or lower depending on how many ARP refreshes you think that you'll need to handle at once). If you are adjusting Selective Packet Discard values, then you should follow the guidelines in the paper I linked.

This forces Cisco IOS to refresh the ARP table within four minutes, if it hasn't happened otherwise for a given ARP entry. The obvious disadvantage is that this doesn't scale well if you have lots of ARP entries... the limits vary by platform. I have used this with a few hundred ARPs per router on Catalyst 4500 / 6500 (the Layer3 SVIs) without any issues.

Option 2: Increase the switch CAM Timers

This option works best if you have a large number of ARP entries (i.e. thousands, such as an intense VMWare environment could see).

On all IOS switches: mac address-table aging-time 14400, or mac address-table aging-time 14400 vlan <vlan-id> for any Vlan that is of concern.

This change adjusts timers that most people assume are fixed at 300 seconds (on Cisco IOS), so be sure to include this in continuity docs. The side-effect of this is that CAM table entries linger for 4 hours after the PC is removed (which can be either good or bad, depending on your PoV). If 4 hours is too long, see the next option...

Option 3: Change both the interface ARP timers, and the switch CAM Timers

This option avoids hideously-long CAM timers in Option 2 at the expense of more configuration. You can choose whether you need 900 seconds, 1800 seconds, or whatever... just make sure your CAM and ARP timers match; thus, you will need to configure both Option 1 and Option 2 in your topologies.

Related Solutions

Ethernet – Monitoring best practice for thresholding errors on an interface

Ethernet standard officially allows 10^-12 bit-error-rate, while in practice the hardware meet much better BER than which standard demands.

You should also be able to bing for 'SQA' (Service Quality Assurance) or 'SLA' (Service Level Agreement), some companies publish them, you could use them to check what your competitors are offering and offer something to that level.

Our SQA states to customers that 0.02% is minor fault (we will fix if ticket is opened), which I think is quite large packet loss for fibre connection, but same SQA covers also DSL so we didn't want to be too aggressive with it. So far this has been sufficient to customers, but we are prepared to reduce the number if it is hurting sales.

There are several bingable tools online, where you can check how much packet loss hurts TCP, which can be useful information when deciding what is acceptable loss for your application/product:

Cisco – Is ‘switchport protected’ supposed to block unicast flooding

The information that I'm seeing conflicts -- the wikipedia page on unicast flooding cites protected mode as a mechanism to block flooding, while Cisco's documentation says that switchport protected doesn't matter, and that switchport block unicast would still be needed to prevent flooding.

switchport protected is used to enforce privacy within a vlan... the command prevents ports from talking to other ports configured with switchport protected. This command reduces flooding as a side-effect of using it on all ports in a Vlan, but it does much more than "just" remove flooding from a switchport. Honestly, I think there are better ways to accomplish your goals.

switchport protected is useful if you're aggregating colocation customers in the same vlan; this command is one way to offer privacy between the customers without the complications of private vlans. The wikipedia article you mentioned, says you can "bounce" traffic off the default gateway (which should not be on a protected switchport) to reach those other destinations...

switchport block unicast does stop unknown unicast flooding; however, see below for why I think you shouldn't need this command.

However, I recently ran into an issue where on a 2950G running some relatively ancient 12.1(22) code, unicast flooding seemed to be completely broken for a protected port -- the aging time on the switch was 5 minutes, while the router's ARP timeout was 30 minutes, and the one TCP connection utilizing this interface had a tendency to sit dormant for 10 minutes at a time - and be non-functional when waking up after 10 minutes in this case.

As I mentioned in my comment, if there is any potential for an asymmetric routed path in this network, you either need unknown unicast flooding, or you need to match the CAM and ARP timers to ensure that CAM entries don't age out before the ARP entries.

In most cases, matching the ARP and CAM timers is the right way to fix the situation, but the choice is yours...

EDIT to respond to the comments:

Setting the timers to match is working great as a workaround - I just don't understand why the flooding isn't happening as expected.

Quoting from "CCIE Practical Studies, Volume 2", page 115 by Karl Solie, Leah Lynch, Charles Ragan:

If unknown unicast and multicast traffic is forwarded to a protected port, there could be security issues. To prevent unknown unicast or multicast traffic from being forwarded from one port to another, you can configure a port (protected or nonprotected) to block unknown unicast and multicast traffic.

3550_switch(config-if)#switchport block unicast
3550_switch(config-if)#switchport block multicast

Details / Background

Best Answer

Related Solutions

Ethernet – Monitoring best practice for thresholding errors on an interface

Cisco – Is ‘switchport protected’ supposed to block unicast flooding

Related Topic