Routing protocol for Large scale hub and spoke VPN

Architecturedesignroutingvpn

I'm looking at implementing a linux-based VPN setup with approx 1200 remote sites connecting into two redundant data centers/DCs. Each remote site has 4 networks, and each DC has ~10. The VPN itself is GRE over IPsec, but what I'm struggling with at present is what is the most appropriate routing protocol to use.

  • Each remote site only needs to know what networks are at the data center, they do not need to know about the networks in use by other remote sites.
  • The networks at each data center may move between each other from time to time, and new networks added (don't want to update static ipsec configs for 1200+ sites each time this happens).
  • The DCs needs to know what networks are in use by every remote site.
  • When the primary DC dies, traffic should route automatically to the backup DC.
  • Most of these remote sites are on dynamic IPs/3G so it's common for sites to come up/down a couple times during the day, and across 1200 sites that equates to a lot of topology changes.
  • The required capacity at present is around 1200, but could grow to 1800-1900 by end of next year.
  • Cannot use closed solutions (Cisco etc, also EIGRP even though it's technically free now).

From my reading of OSPF, it does not appear to support the number of sites.

My questions are:

  • What routing protocol is best suited for this setup?
  • How should I structure it to handle the large number of sites?

Best Answer

I think BGP is probably your best bet in this situation. The number of tail sites you have quickly exceeds the benefits of most IGPs. BGP would allow you to achieve all of your desired implementation requirements, such as limiting tail site knowledge to the primary/backup DC, standards based protocol, ability to efficiently handle over 5,000 different routes, etc.

OSPF is still an option; I just don’t think it’s the best option. You could always setup each leg of your network as a Totally Stubby Area so inter-area routes aren’t propagated down to the each site. Then you could have separate routing instances at each site to handle their networks independently of your head end.

Each SPF recalculation is going to cost you a pretty penny on resources. How volatile these connections are will affect that, this could be often depending on how aggressive your timing is.

Whatever your decision, BGP was designed to scale. And when anyone says they are looking for a way to scale to 7,000+ networks1, BGP is a no-brainer.


1 Forecasting based on growth to 1,900 sites x 4 networks at each site.

Related Topic