OpenVPN on LEDE (openwrt) stoppes working after one hour

domain-name-systemopenvpnopenwrtvpn

I have a Linksys WRT1900ACS running LEDE Reboot 17.01.1 and I followed this guide to setup openvpn against my vpn provider vpntunnel.com.

I also used the up/down scripts from this guide to handle DNS.

Everything works for one hour, after one hour I first lose DNS then shortly connectivity.

The flow of this problem seems to be this; first I see this in the syslog:

Fri Jun  9 14:06:21 2017 daemon.notice netifd: wan (1404): udhcpc: sending renew
Fri Jun  9 14:06:22 2017 daemon.notice netifd: wan (1404): udhcpc: lease of {MY WAN IP} obtained, lease time 3600

Then the openvpn client logs this:

Fri Jun  9 14:06:43 2017 VERIFY OK: depth=1, C=GB, ST=LN, L=London, O=vpnsvc, OU=vpnsvc, CN=vpnsvc.com, name=vpnsvc, emailAddress=noc@vpnsvc.com
Fri Jun  9 14:06:43 2017 VERIFY KU OK
Fri Jun  9 14:06:43 2017 Validating certificate extended key usage
Fri Jun  9 14:06:43 2017 ++ Certificate has EKU (str) TLS Web Server Authentication, expects TLS Web Server Authentication
Fri Jun  9 14:06:43 2017 VERIFY EKU OK
Fri Jun  9 14:06:43 2017 VERIFY OK: depth=0, C=GB, ST=LN, L=London, O=vpnsvc, OU=vpnsvc, CN=vpnsvc, name=vpnsvc, emailAddress=noc@vpnsvc.com
Fri Jun  9 14:06:44 2017 Data Channel Encrypt: Cipher 'AES-256-GCM' initialized with 256 bit key
Fri Jun  9 14:06:44 2017 Data Channel Decrypt: Cipher 'AES-256-GCM' initialized with 256 bit key
Fri Jun  9 14:06:44 2017 Control Channel: TLSv1.2, cipher TLSv1/SSLv3 ECDHE-RSA-AES256-GCM-SHA384, 2048 bit RSA

Then at 14:09:13 I've lost connectivity. I can't reach any host outside of my network. Then openvpn picks up the problem and loops in error:

Fri Jun  9 14:10:51 2017 [vpnsvc] Inactivity timeout (--ping-restart), restarting
Fri Jun  9 14:10:51 2017 SIGUSR1[soft,ping-restart] received, process restarting
Fri Jun  9 14:10:56 2017 TCP/UDP: Preserving recently used remote address: [AF_INET]194.242.11.163:443
Fri Jun  9 14:10:56 2017 TCP/UDP: Preserving recently used remote address: [AF_INET]194.242.11.163:443
Fri Jun  9 14:10:56 2017 UDP link local: (not bound)
Fri Jun  9 14:10:56 2017 UDP link remote: [AF_INET]194.242.11.163:443
Fri Jun  9 14:11:56 2017 TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity)
Fri Jun  9 14:11:56 2017 TLS Error: TLS handshake failed
Fri Jun  9 14:11:56 2017 SIGUSR1[soft,tls-error] received, process restarting

To fix the situation I need to reconnect the vpn interface and reconnect the wan interface. After that everything works again for about one hour.

I've tried setting auth-nocache to avoid renegotiating the auth, hoping that my problem was similar to this, but that did not help. I reset the router, and followed the instructions again, same problem.

At this point I'm lost as to what to try next. Any pointers to how I can debug this further?

Update:

Okay, so I set my wan to a static IP to rule out a potential problem with the wan asking for a new ip every hour. After one hour the problem still occurred.

The openvpn tries to reauthenticate several times and fail:

Sat Jun 10 21:21:33 2017 us=62938 TLS: soft reset sec=0 bytes=2838281/-1 pkts=11352/0
Sat Jun 10 21:21:33 2017 us=153598 VERIFY OK: depth=1, C=GB, ST=LN, L=London, O=vpnsvc, OU=vpnsvc, CN=vpnsvc.com, name=vpnsvc, emailAddress=noc@vpnsvc.com
Sat Jun 10 21:21:33 2017 us=154667 VERIFY KU OK
Sat Jun 10 21:21:33 2017 us=154722 Validating certificate extended key usage
Sat Jun 10 21:21:33 2017 us=154763 ++ Certificate has EKU (str) TLS Web Server Authentication, expects TLS Web Server Authentication
Sat Jun 10 21:21:33 2017 us=154799 VERIFY EKU OK
Sat Jun 10 21:21:33 2017 us=154833 VERIFY OK: depth=0, C=GB, ST=LN, L=London, O=vpnsvc, OU=vpnsvc, CN=vpnsvc, name=vpnsvc, emailAddress=noc@vpnsvc.com
Sat Jun 10 21:21:35 2017 us=911573 Data Channel Encrypt: Cipher 'AES-256-GCM' initialized with 256 bit key
Sat Jun 10 21:21:35 2017 us=911659 Data Channel Decrypt: Cipher 'AES-256-GCM' initialized with 256 bit key
Sat Jun 10 21:21:35 2017 us=911759 Control Channel: TLSv1.2, cipher TLSv1/SSLv3 ECDHE-RSA-AES256-GCM-SHA384, 2048 bit RSA
Sat Jun 10 21:33:21 2017 us=932911 [vpnsvc] Inactivity timeout (--ping-restart), restarting
Sat Jun 10 21:33:21 2017 us=933375 TCP/UDP: Closing socket

When I reconnect the wan interface it succeeds with the re auth but before the initialization sequence is completed I get this:

Sat Jun 10 21:38:56 2017 us=700658 /sbin/route del -net 194.242.11.163 netmask 255.255.255.255 route: SIOCDELRT: No such process
Sat Jun 10 21:38:56 2017 us=702079 ERROR: Linux route delete command failed: external program exited with error status: 1
Sat Jun 10 21:38:56 2017 us=702174 /sbin/route del -net 0.0.0.0 netmask 128.0.0.0 route: SIOCDELRT: No such process
Sat Jun 10 21:38:56 2017 us=703501 ERROR: Linux route delete command failed: external program exited with error status: 1
Sat Jun 10 21:38:56 2017 us=703583 /sbin/route del -net 128.0.0.0 netmask 128.0.0.0 route: SIOCDELRT: No such process
Sat Jun 10 21:38:56 2017 us=704883 ERROR: Linux route delete command failed: external program exited with error status: 1

Update 2
So I finally seem to be getting somewhere, it seems as my DNS settings are getting reset to the ones I get from DHCP. Those DNS servers can't be reached through the vpn, so when the re-auth to the vpn servers occurs I've lost DNS and can't reach the vpn server. So the active vpn connection times out and since all traffic is routed through the vpn interface I lose connectivity completely. Reconnecting the wan interface somehow makes traffic flow through the wan again and a new vpn connection can be established.

So now I just need to figure out why my DNS settings are reset. And verify my theory.

Best Answer

I had similar issues when my time on the machine connecting to VPN and time on VPN server were not in sync. I've solved it by using external NTP.