Vpn – High Availability for ISR 4431 IPSec VPNs

hsrpipsecospfredundancyvpn

I have the following lab setup in GNS3. R1 & R2 are VPN endpoint at my datacenters and R3 is simulating a remote site. What i'm trying to accomplish is having one tunnel from R3 to the datacenter that leverages both R1 & R2 using HSRP and stateful failover. The configurations I'm attaching work, however the failover time is slow (30 sec+) and also fails when normalizing the solution. Would there be some timers I can adjust or perhaps leverage something else to speed up this process? Or would it just be best to scrap the HSRP solution and use two tunnels and let OSPF failover?

enter image description here

Here is a picture of what happens when shutdown r1 to test.

enter image description here

R1:

!
hostname r1
!
ipc zone default
 association 1
  no shutdown
  protocol sctp
   local-port 5000
    local-ip 172.16.1.2
   remote-port 5000
    remote-ip 172.16.1.3
!
redundancy inter-device
 scheme standby HA-Out
 security ipsec TUNNEL-PROFILE-SITE
!
redundancy
!
crypto ikev2 proposal IKEv2-PROPOSAL
 encryption aes-gcm-256
 prf sha384
 group 20
!
crypto ikev2 policy IKEv2-POLICY
 match fvrf any
 proposal IKEv2-PROPOSAL
!
crypto ikev2 keyring IKEv2-KEYRING
 peer TO-SITE
  address 172.16.2.1
  pre-shared-key cisco123
 !
!
crypto ikev2 profile IKEv2-PROFILE-SITE
 match identity remote any
 authentication local pre-share
 authentication remote pre-share
 keyring local IKEv2-KEYRING
!
crypto ipsec transform-set MYSET esp-gcm 256
 mode tunnel
!
crypto ipsec profile TUNNEL-PROFILE-SITE
 set transform-set MYSET
 set ikev2-profile IKEv2-PROFILE-SITE
 redundancy HA-Out stateful
!
interface Loopback0
 ip address 10.1.1.1 255.255.255.255
!
interface Tunnel208
 description <== Datacenter Connection to SITE ==>
 ip unnumbered Loopback0
 tunnel source 172.16.1.1
 tunnel mode ipsec ipv4
 tunnel destination 172.16.2.1
 tunnel protection ipsec profile TUNNEL-PROFILE-SITE
!
interface GigabitEthernet0/0
 ip address 172.16.1.2 255.255.255.240
 standby 1 ip 172.16.1.1
 standby 1 priority 110
 standby 1 preempt
 standby 1 name HA-Out
 duplex auto
 speed auto
 media-type rj45
!
interface GigabitEthernet0/1
 ip address 10.1.200.2 255.255.255.0
 standby 0 name HA-In
 standby 1 ip 10.1.200.1
 standby 1 priority 110
 standby 1 preempt
 duplex auto
 speed auto
 media-type rj45
 !
router ospf 1
 passive-interface default
 no passive-interface Tunnel208
 network 10.1.1.1 0.0.0.0 area 0
 network 10.1.200.0 0.0.0.255 area 0
 network 10.254.2.32 0.0.0.3 area 0
!
ip route 0.0.0.0 0.0.0.0 172.16.1.10
!

R2:

!
hostname r2
!
ipc zone default
 association 1
  no shutdown
  protocol sctp
   local-port 5000
    local-ip 172.16.1.3
   remote-port 5000
    remote-ip 172.16.1.2
!
redundancy inter-device
 scheme standby HA-Out
 security ipsec TUNNEL-PROFILE-SITE
!
redundancy
!
crypto ikev2 proposal IKEv2-PROPOSAL
 encryption aes-gcm-256
 prf sha384
 group 20
!
crypto ikev2 policy IKEv2-POLICY
 match fvrf any
 proposal IKEv2-PROPOSAL
!
crypto ikev2 keyring IKEv2-KEYRING
 peer TO-SITE
  address 172.16.2.1
  pre-shared-key cisco123
 !
!
crypto ikev2 profile IKEv2-PROFILE-SITE
 match identity remote any
 authentication local pre-share
 authentication remote pre-share
 keyring local IKEv2-KEYRING
!
crypto ipsec transform-set MYSET esp-gcm 256
 mode tunnel
!
crypto ipsec profile TUNNEL-PROFILE-SITE
 set transform-set MYSET
 set ikev2-profile IKEv2-PROFILE-SITE
 redundancy HA-Out stateful
!
interface Loopback0
 ip address 10.1.1.2 255.255.255.255
!
interface Tunnel208
 description <== Datacenter Connection to SITE ==>
 ip unnumbered Loopback0
 tunnel source 172.16.1.1
 tunnel mode ipsec ipv4
 tunnel destination 172.16.2.1
 tunnel protection ipsec profile TUNNEL-PROFILE-SITE
!
interface GigabitEthernet0/0
 ip address 172.16.1.3 255.255.255.240
 standby 1 ip 172.16.1.1
 standby 1 priority 105
 standby 1 name HA-Out
 duplex auto
 speed auto
 media-type rj45
!
interface GigabitEthernet0/1
 ip address 10.1.200.3 255.255.255.0
 standby 0 name HA-In
 standby 1 ip 10.1.200.1
 standby 1 priority 105
 duplex auto
 speed auto
 media-type rj45
!
router ospf 1
 passive-interface default
 no passive-interface Tunnel208
 network 10.1.1.2 0.0.0.0 area 0
 network 10.1.200.0 0.0.0.255 area 0
 network 10.254.2.32 0.0.0.3 area 0
!
ip route 0.0.0.0 0.0.0.0 172.16.1.10
!

R3:

!
hostname r3
!
crypto ikev2 proposal IKEv2-PROPOSAL
 encryption aes-gcm-256
 prf sha384
 group 20
!
crypto ikev2 policy IKEv2-POLICY
 match fvrf any
 proposal IKEv2-PROPOSAL
!
crypto ikev2 keyring IKEv2-KEYRING
 peer TO-DC01
  address 172.16.1.1
  pre-shared-key cisco123
 !
!
crypto ikev2 profile IKEv2-PROFILE-DC01
 match identity remote any
 authentication local pre-share
 authentication remote pre-share
 keyring local IKEv2-KEYRING
!
crypto ipsec transform-set MYSET esp-gcm 256
 mode tunnel
!
crypto ipsec profile TUNNEL-PROFILE-DC01
 set transform-set MYSET
 set ikev2-profile IKEv2-PROFILE-DC01
!
interface Loopback0
 ip address 10.10.1.1 255.255.255.255
!
interface Tunnel208
 description <== Datacenter Connection ==>
 ip unnumbered Loopback0
 tunnel source 172.16.2.1
 tunnel mode ipsec ipv4
 tunnel destination 172.16.1.1
 tunnel protection ipsec profile TUNNEL-PROFILE-DC01
!
interface GigabitEthernet0/0
 ip address 172.16.2.1 255.255.255.240
 duplex auto
 speed auto
 media-type rj45
 bfd template sample
 no cdp enable
!
!
router ospf 1
 passive-interface default
 no passive-interface Tunnel208
 network 10.10.1.1 0.0.0.0 area 0
 network 10.254.2.32 0.0.0.3 area 0
!
ip route 0.0.0.0 0.0.0.0 172.16.2.10
!

Best Answer

Would there be some timers I can adjust or perhaps leverage something else to speed up this process? Or would it just be best to scrap the HSRP solution and use two tunnels and let OSPF failover?

Yes, there are HSRP timers that you can adjust, and you could use sub-second timers, but that will drive up the CPU utilization to possibly unacceptable levels. Cisco has a lot of documents describing HSRP features and configurations:

HSRP Timers

Each router only uses three timers in HSRP. The timers time hello messages. The HSRP converges, when a failure occurs, depend on how the HSRP hello and hold timers are configured. By default, these timers are set to 3 and 10 seconds, respectively, which means that a hello packet is sent between the HSRP standby group devices every 3 seconds, and the standby device becomes active when a hello packet has not been received for 10 seconds. You can lower these timer settings to speed up the failover or preemption, but, to avoid increased CPU usage and unnecessary standby state flapping, do not set the hello timer below one (1) second or the hold timer below 4 seconds. Note that, if you use the HSRP tracking mechanism and the tracked link fails, the failover or preemption occurs immediately, regardless of the hello and hold timers. When a timer expires, the router transitions to a new HSRP state. The timers can be changed with this command: standby [group-number] timers hellotime holdtime. For example, standby 1 timers 5 15.

A routing protocol can failover much faster than HSRP. The idea of a FHRP like HSRP is that hosts usually only have a single default gateway, and the FHRP will create a virtual gateway to which the hosts on a LAN can send traffic destined for a different network. The routers will communicate with each other to determine which is the active router for the virtual gateway, and to see if the active router is still functioning. The standby router will take over if the active router fails, but the determination of failure of the active router takes time. It also takes time for a routing protocol to determine a path failure and reroute; you will never get instantaneous failover between separate devices.


There are also other things you would want to do for a true HA scenario. For example, you should have a link between R1 and R2, and you should track the router interfaces to SW1, decrementing the HSRP priority when the link fails but the router is still up.

Personally, I would set your layer-3 switch, AS01, as the gateway, then use a routing protocol between it and the two routers (with a link between the two routers) to let the routing protocol determine how to route the traffic.