Routing – ECMP between next-hops with different route prefix length

ecmproutingswitch

I have a basic question about ECMP, say I have a /20 route and also a /16 route to a host, and both are reachable, can ECMP load-balance between different prefix length or is it like, hardware first determines the route through LPM and then only applies ECMP?
I mean not just about hardware, is there something wrong in this approach by principles? What are the general hardware capabilities like? Like ASICS from Broadcom etc. Do they support this if at all this is reasonable?

There is no hidden thing, I work with a networking company, and just learning routing related things and wanted to understand. I just got this question that what prevents ECMP in this situation ? Why do you think that its random ? Since my company uses these chips so I asked about this.

Best Answer

Question 1: I have a basic question about ECMP, say I have a /20 route and also a /16 route to a host, and both are reachable, can ECMP load-balance between different prefix length or is it like, hardware first determines the route through LPM and then only applies ECMP?

Quoting rfc 1812, Requirements for IPv4 Routers, Section 2.2.5.2

Routers must use the most specific matching route (the longest matching network prefix) when forwarding traffic.

Once the router has picked the longest match prefix for the destination (using what the rfc calls "pruning rules"), it must pick the next-hop to forward the traffic to. At this point ECMP (or "load-splitting") occurs...

Quoting rfc 1812, Requirements for IPv4 Routers, Section 5.2.4.3 Next Hop Address,

Conceptually, any route lookup algorithm starts out with a set of candidate routes that consists of the entire contents of the FIB. The algorithm consists of a series of steps that discard routes from the set. These steps are referred to as Pruning Rules. Normally, when the algorithm terminates there is exactly one route remaining in the set. If the set ever becomes empty, the packet is discarded because the destination is unreachable. It is also possible for the algorithm to terminate when more than one route remains in the set. In this case, the router may arbitrarily discard all but one of them, or may perform "load-splitting" by choosing whichever of the routes has been least recently used.


Question 2: How it would cause loops? Could you please explain in detail?

Assumptions...

  • Suppose R1, R2, and R3 below all run iBGP with each other in a full-mesh of peers, and they run OSPF to cover next-hop routing for their iBGP prefixes
  • R1 announces an iBGP route to 10.1.0.0/20 with an OSPF E2 next-hop via S1/0
  • R3 announces an iBGP route to 10.1.0.0/16 with an OSPF E2 next-hop via S3/0
  • Assume the OSPF E2 metrics for R1:S1/0 and R3:S3/0 are the same
  • R2 receives a packet for 10.1.15.0 on R2:S2/0

Diagram...

                S1/0       S1/2   S2/1     S2/3   S3/2    S3/0
Static Route to    +------+         +------+        +------+       Static Route to
10.1.0.0/20 <------|  R1  |---------|  R2  |--------|  R3  |-----> 10.1.0.0/16
                   +------+         +------+        +------+
                                        | S2/0
                                        |
                                        
                                        ^
                                        ^
                                        | Ingress packet to 10.1.15.1
                                        |
  • If R1, R2 and R3 do not choose the longest-prefix match before an ECMP algorithm, then R2 could use ECMP to randomly choose whether it forwards 10.1.15.1 out R2:S2/1 or R2:S2/3. When R3:S3/2 gets a packet for 10.1.15.1, it could be forwarded back to R2:2/3.
  • If R1, R2 and R3 choose the longest-prefix match first, then R2 will always forward 10.1.15.1 out R2:S2/1.

Question 3: So I see you are saying that hardware are really not capable of not doing it. Does this apply to all kind of chips?

This applies to all kinds of chips that handle IP forwarding. Relevant portions of RFC compliance are required if the silicon vendor wants any hope of selling their chips.