Vpn – Timeouts for ASA VPN peers

cisco-asafailoveripsecvpn

I have configured a remote site with two IP addresses for data centre VPN peers – one primary (1.1.1.1), one backup (2.2.2.2). When the primary peer fails, the remote site detects the failure using DPD (after about 15 seconds). It tears down the SAs, then proceeds to try to connect again to the primary peer! After about 30 seconds of no response, it finally tries the backup peer and connects immediately. Has anyone else seen this, and is there any way to avoid this needless 30 second wait?!

(Code version is 8.2(5), configs below)

REMOTE SITE FIREWALL:

crypto ipsec transform-set L2L-VPN-TRANSFORM esp-3des esp-sha-hmac 
crypto ipsec security-association lifetime seconds 28800
crypto ipsec security-association lifetime kilobytes 4608000
!
tunnel-group 1.1.1.1 type ipsec-l2l
tunnel-group 1.1.1.1 general-attributes
 default-group-policy L2L-VPN-POLICY
tunnel-group 1.1.1.1 ipsec-attributes
 pre-shared-key *****
tunnel-group 2.2.2.2 type ipsec-l2l
tunnel-group 2.2.2.2 general-attributes
 default-group-policy L2L-VPN-POLICY
tunnel-group 2.2.2.2 ipsec-attributes
 pre-shared-key *****
!
crypto map OUTSIDE-MAP 1 match address ATOS-DC-ENCRYPTION-DOMAIN
crypto map OUTSIDE-MAP 1 set pfs 
crypto map OUTSIDE-MAP 1 set peer 1.1.1.1 2.2.2.2 
crypto map OUTSIDE-MAP 1 set transform-set L2L-VPN-TRANSFORM
crypto map OUTSIDE-MAP 1 set security-association lifetime seconds 3600
crypto map OUTSIDE-MAP 1 set security-association lifetime kilobytes 4608000
crypto map OUTSIDE-MAP interface OUTSIDE
crypto isakmp enable OUTSIDE
crypto isakmp policy 1
 authentication pre-share
 encryption 3des
 hash sha
 group 2
 lifetime 86400

DC FIREWALLS (BOTH THE SAME):

crypto ipsec transform-set L2L-VPN-TRANSFORM esp-3des esp-sha-hmac 
crypto ipsec security-association lifetime seconds 28800
crypto ipsec security-association lifetime kilobytes 4608000
!
tunnel-group DefaultL2LGroup general-attributes
 default-group-policy L2L-VPN-POLICY
tunnel-group DefaultL2LGroup ipsec-attributes
 pre-shared-key *****
!
crypto dynamic-map REMOTE-DYNMAP 1 set pfs 
crypto dynamic-map REMOTE-DYNMAP 1 set transform-set L2L-VPN-TRANSFORM
crypto dynamic-map REMOTE-DYNMAP 1 set security-association lifetime seconds 3600
crypto dynamic-map REMOTE-DYNMAP 1 set security-association lifetime kilobytes 4608000
crypto dynamic-map REMOTE-DYNMAP 1 set reverse-route
crypto map OUTSIDE-MAP 1 ipsec-isakmp dynamic REMOTE-DYNMAP
crypto map OUTSIDE-MAP interface OUTSIDE
crypto isakmp enable OUTSIDE
crypto isakmp policy 1
 authentication pre-share
 encryption 3des
 hash sha
 group 2
 lifetime 86400

Logs showing the failure and the delay:

Jun 18 2013 00:52:46: %ASA-5-111008: User 'enable_15' executed the 'clear logging buffer' command.
Jun 18 2013 00:54:37: %ASA-3-713123: Group = 1.1.1.1, IP = 1.1.1.1, IKE lost contact with remote peer, deleting connection (keepalive type: DPD)
Jun 18 2013 00:54:37: %ASA-5-713259: Group = 1.1.1.1, IP = 1.1.1.1, Session is being torn down. Reason: Lost Service
Jun 18 2013 00:54:37: %ASA-4-113019: Group = 1.1.1.1, Username = 1.1.1.1, IP = 1.1.1.1, Session disconnected. Session Type: IPsec, Duration: 0h:03m:00s, Bytes xmt: 480192, Bytes rcv: 478992, Reason: Lost Service
Jun 18 2013 00:54:37: %ASA-5-713041: IP = 1.1.1.1, IKE Initiator: New Phase 1, Intf OUTSIDE, IKE Peer 1.1.1.1  local Proxy Address 10.233.224.4, remote Proxy Address 1.1.1.1,  Crypto map (OUTSIDE-MAP)
Jun 18 2013 00:54:39: %ASA-3-713042: IKE Initiator unable to find policy: Intf INSIDE-TRANSIT, Src: 10.60.1.1, Dst: 10.250.1.1
Jun 18 2013 00:54:41: %ASA-3-713042: IKE Initiator unable to find policy: Intf INSIDE-TRANSIT, Src: 10.60.1.1, Dst: 10.250.1.1
Jun 18 2013 00:54:43: %ASA-3-713042: IKE Initiator unable to find policy: Intf INSIDE-TRANSIT, Src: 10.60.1.1, Dst: 10.250.1.1
Jun 18 2013 00:54:45: %ASA-3-713042: IKE Initiator unable to find policy: Intf INSIDE-TRANSIT, Src: 10.60.1.1, Dst: 10.250.1.1
Jun 18 2013 00:54:47: %ASA-3-713042: IKE Initiator unable to find policy: Intf INSIDE-TRANSIT, Src: 10.60.1.1, Dst: 10.250.1.1
Jun 18 2013 00:54:48: %ASA-3-713042: IKE Initiator unable to find policy: Intf INSIDE-TRANSIT, Src: 10.60.1.1, Dst: 10.250.1.1
Jun 18 2013 00:54:49: %ASA-3-713042: IKE Initiator unable to find policy: Intf INSIDE-TRANSIT, Src: 10.60.1.1, Dst: 10.250.1.1
Jun 18 2013 00:54:51: %ASA-3-713042: IKE Initiator unable to find policy: Intf INSIDE-TRANSIT, Src: 10.60.1.1, Dst: 10.250.1.1
Jun 18 2013 00:54:53: %ASA-3-713042: IKE Initiator unable to find policy: Intf INSIDE-TRANSIT, Src: 10.60.1.1, Dst: 10.250.1.1
Jun 18 2013 00:54:55: %ASA-3-713042: IKE Initiator unable to find policy: Intf INSIDE-TRANSIT, Src: 10.60.1.1, Dst: 10.250.1.1
Jun 18 2013 00:54:57: %ASA-3-713042: IKE Initiator unable to find policy: Intf INSIDE-TRANSIT, Src: 10.60.1.1, Dst: 10.250.1.1
Jun 18 2013 00:54:59: %ASA-3-713042: IKE Initiator unable to find policy: Intf INSIDE-TRANSIT, Src: 10.60.1.1, Dst: 10.250.1.1
Jun 18 2013 00:55:01: %ASA-3-713042: IKE Initiator unable to find policy: Intf INSIDE-TRANSIT, Src: 10.60.1.1, Dst: 10.250.1.1
Jun 18 2013 00:55:03: %ASA-3-713042: IKE Initiator unable to find policy: Intf INSIDE-TRANSIT, Src: 10.60.1.1, Dst: 10.250.1.1
Jun 18 2013 00:55:05: %ASA-3-713042: IKE Initiator unable to find policy: Intf INSIDE-TRANSIT, Src: 10.60.1.1, Dst: 10.250.1.1
Jun 18 2013 00:55:07: %ASA-3-713042: IKE Initiator unable to find policy: Intf INSIDE-TRANSIT, Src: 10.60.1.1, Dst: 10.250.1.1
Jun 18 2013 00:55:09: %ASA-5-713041: IP = 2.2.2.2, IKE Initiator: New Phase 1, Intf OUTSIDE, IKE Peer 2.2.2.2  local Proxy Address 10.233.224.4, remote Proxy Address 2.2.2.2,  Crypto map (OUTSIDE-MAP)
Jun 18 2013 00:55:09: %ASA-5-713119: Group = 2.2.2.2, IP = 2.2.2.2, PHASE 1 COMPLETED
Jun 18 2013 00:55:09: %ASA-5-713049: Group = 2.2.2.2, IP = 2.2.2.2, Security negotiation complete for LAN-to-LAN Group (2.2.2.2)  Initiator, Inbound SPI = 0xd21ad657, Outbound SPI = 0xd7d9c25a
Jun 18 2013 00:55:09: %ASA-5-713120: Group = 2.2.2.2, IP = 2.2.2.2, PHASE 2 COMPLETED (msgid=1949f878)
Jun 18 2013 00:55:09: %ASA-5-713041: Group = 2.2.2.2, IP = 2.2.2.2, IKE Initiator: New Phase 2, Intf INSIDE-TRANSIT, IKE Peer 2.2.2.2  local Proxy Address 10.60.0.0, remote Proxy Address 10.0.0.0,  Crypto map (OUTSIDE-MAP)
Jun 18 2013 00:55:09: %ASA-5-713049: Group = 2.2.2.2, IP = 2.2.2.2, Security negotiation complete for LAN-to-LAN Group (2.2.2.2)  Initiator, Inbound SPI = 0xd4218cd3, Outbound SPI = 0xf9a8108b
Jun 18 2013 00:55:09: %ASA-5-713120: Group = 2.2.2.2, IP = 2.2.2.2, PHASE 2 COMPLETED (msgid=c0a82858)

Best Answer

The ASA does not have a mechanism to mark peers as up or down. When DPD detects a peer is no longer available, any SA with that peer is torn down. When "interesting traffic" requires a new SA, the ASA goes through its normal phase 1 process, which means starting with the first peer in your crypto map and if a connection cannot be established, trying the second.

The 30 seconds of delay is a timing issue while the ASA "sorts things out" so to speak. You have traffic triggering a new SA, while DPD is tearing down the existing SA to the same peer, which creates a bit of a race condition that eventually sorts out.

If faster failover is desired, it usually entails maintaining both tunnels simultaneously and using GRE and dynamic routing to move traffic when a tunnel fails. However, that would require using routers instead of ASAs, since ASAs do not support GRE, and have some shortcomings with dynamic routing.