Linux – Strange OpenVPN behavior – disconnects after one minute

linuxnetworkingopenvpnUbuntuvpn

I'm using OpenVPN for connect two private networks and now I got a problem, that I'm not able to solve.
Servers are connected with simple UDP configuration with static key. I've already checked iptables for limits or something and there's nothing, also both servers are directly on the public IP – no routers/NAT or something is between. Server A is listening and server B is client.
When the VPN starts, client connects to each other and everything works perfectly, but ONLY for first minute.
Then it stop working. Tunnel connection (ping from one end point to the other) from server A to server B still working (can ping), but from the other side, it does not work. After next one minute watchdog realize, that connection is down on the server B and restart tunnel. Then it's working for one minute and this repeats forever…

Both servers are Ubuntu 64bit:

Server A:

root@server:/etc/openvpn# uname -an
Linux server 2.6.38-13-virtual #52~lucid1-Ubuntu SMP Thu Nov 10 19:46:44 UTC 2011 x86_64 GNU/Linux
root@server:/etc/openvpn# openvpn --version
OpenVPN 2.1.0 x86_64-pc-linux-gnu [SSL] [LZO2] [EPOLL] [PKCS11] [MH] [PF_INET6] [eurephia] built on Jul 20 2010
Originally developed by James Yonan
Copyright (C) 2002-2009 OpenVPN Technologies, Inc. 

Server B:

root@gw2:~# uname -an
Linux gw2 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
root@gw2:~# openvpn --version
OpenVPN 2.2.1 x86_64-linux-gnu [SSL] [LZO2] [EPOLL] [PKCS11] [eurephia] [MH] [PF_INET6] [IPv6 payload 20110424-2 (2.2RC2)] built on Feb 27 2013
Originally developed by James Yonan
Copyright (C) 2002-2010 OpenVPN Technologies, Inc. 

  $ ./configure --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --libexecdir=${prefix}/lib/openvpn --disable-maintainer-mode --disable-dependency-tracking CFLAGS=-g -O2 -fPIE -fstack-protector --param=ssp-buffer-size=4 -Wformat -Wformat-security -Werror=format-security CPPFLAGS=-D_FORTIFY_SOURCE=2 CXXFLAGS=-g -O2 -fPIE -fstack-protector --param=ssp-buffer-size=4 -Wformat -Wformat-security -Werror=format-security FFLAGS=-g -O2 LDFLAGS=-Wl,-Bsymbolic-functions -fPIE -pie -Wl,-z,relro -Wl,-z,now --enable-password-save --host=x86_64-linux-gnu --build=x86_64-linux-gnu --prefix=/usr --mandir=${prefix}/share/man --with-ifconfig-path=/sbin/ifconfig --with-route-path=/sbin/route

Compile time defines:  ENABLE_CLIENT_SERVER ENABLE_DEBUG ENABLE_EUREPHIA ENABLE_FRAGMENT ENABLE_HTTP_PROXY ENABLE_MANAGEMENT ENABLE_MULTIHOME ENABLE_PASSWORD_SAVE ENABLE_PORT_SHARE ENABLE_SOCKS USE_CRYPTO USE_LIBDL USE_LZO USE_PF_INET6 USE_PKCS11 USE_SSL

Server A ovpn config:

daemon vpn-conn
writepid /var/run/openvpn-vpn.pid
dev tun3
proto udp
port 1859
comp-lzo
keepalive 10 30
persist-tun
persist-key
ifconfig 10.9.0.1 10.9.0.2
route 10.10.10.0 255.255.255.0
secret my-key.key
log-append vpn.log
verb 5

Server B:

daemon vpn
writepid /var/run/openvpn-vpn.pid
remote 4.3.2.1
dev tun0
proto udp
port 1859
comp-lzo
keepalive 10 60
ping-timer-rem
persist-tun
persist-key
ifconfig 10.9.0.2 10.9.0.1
route 192.168.0.0 255.255.252.0
secret my-key.key
log-append vpn.log
mtu-test
verb 5

I did some research to add/ remove "ping-timer-rem" "mtu-test" and "float" to client and server configuration, but the problem still remains.

Server A still logging strange things into log (i think, that it CAN be the source of the problem, but i don't know, how to solve it. Time on both servers is the same):

Wed Sep  4 10:25:44 2013 us=125832 Authenticate/Decrypt packet error: bad packet ID (may be a replay): [ #100 / time = (1378283056) Wed Sep  4 10:24:16 2013 ] -- see the man page entry for --no-replay and --replay-window for more info or silence this warning with --mute-replay-warnings

Another strange on server A is that it seems, that server B is connecting from two sockets! I've checked server B and there's only ONE openvpn instance and no others. When I kill it, connection probes from both sockets ends.

Server A log details:

Wed Sep  4 09:56:12 2013 us=544282 Peer Connection Initiated with [AF_INET]1.2.3.4:1859
Wed Sep  4 09:57:06 2013 us=661505 Peer Connection Initiated with [AF_INET]1.2.3.4:1194

Server B detail:

Wed Sep  4 10:28:16 2013 us=98524 SIGUSR1[soft,ping-restart] received, process restarting
Wed Sep  4 10:28:16 2013 us=98562 Restart pause, 2 second(s)
Wed Sep  4 10:28:18 2013 us=98688 NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
Wed Sep  4 10:28:18 2013 us=98871 Re-using pre-shared static key
Wed Sep  4 10:28:18 2013 us=98905 LZO compression initialized
Wed Sep  4 10:28:18 2013 us=98981 Socket Buffers: R=[229376->131072] S=[229376->131072]
Wed Sep  4 10:28:18 2013 us=99043 Preserving previous TUN/TAP instance: tun0
Wed Sep  4 10:28:18 2013 us=99075 Data Channel MTU parms [ L:1545 D:1450 EF:45 EB:135 ET:0 EL:0 AF:3/1 ]
Wed Sep  4 10:28:18 2013 us=99144 Local Options String: 'V4,dev-type tun,link-mtu 1545,tun-mtu 1500,proto UDPv4,ifconfig 10.9.0.1 10.9.0.2,comp-lzo,cipher BF-CBC,auth SHA1,keysize 128,secret'
Wed Sep  4 10:28:18 2013 us=99167 Expected Remote Options String: 'V4,dev-type tun,link-mtu 1545,tun-mtu 1500,proto UDPv4,ifconfig 10.9.0.2 10.9.0.1,comp-lzo,cipher BF-CBC,auth SHA1,keysize 128,secret'
Wed Sep  4 10:28:18 2013 us=99215 Local Options hash (VER=V4): '184f07f3'
Wed Sep  4 10:28:18 2013 us=99255 Expected Remote Options hash (VER=V4): 'de9a476a'
Wed Sep  4 10:28:18 2013 us=99291 UDPv4 link local (bound): [undef]
Wed Sep  4 10:28:18 2013 us=99321 UDPv4 link remote: [AF_INET]4.3.2.1:1859
WrWrWRWed Sep  4 10:28:21 2013 us=987011 Peer Connection Initiated with [AF_INET]4.3.2.1:1859
wrWrWed Sep  4 10:28:22 2013 us=847036 Initialization Sequence Completed
WrWRwrWRwrWWed Sep  4 10:28:24 2013 us=931728 NOTE: Beginning empirical MTU test -- results should be available in 3 to 4 minutes.
WRwrWRRwrWRwrWrWWrWRwrWRWwrWRRwrWRWwrWRRwrWRwrWRWwrWRwrWRwrWRWwrWRRwrWRwrWRWwrWRwrWRwrWRWwrWRRwrWRwrWRWwrWRwrWRwWrWRRwrWRwrWRwrWRWwrWRwrWRwrWRWwrWRRwrWRWwrWRwrWRwrWRwrWRWwrWRRwrWRwrWRwrWRWwrWRwrWRWwrWRRwrWRwrWRWwrWRwrWRwrWRwrWrWWrWRRwrWR
wrWWRwrWRwrWRwrWrWrWRWwrWRWrWrWWrWWrWWWWWWWWWWWWWWWrWrWWWrWrWWed Sep  4 10:30:19 2013 us=505037 Inactivity timeout (--ping-restart), restarting
Wed Sep  4 10:30:19 2013 us=505153 TCP/UDP: Closing socket

On the server B, there's NO "1194" string in the log, but when I try to tcpdump packets between servers (1.2.3.4 = client, 4.3.2.1 = server):

root@gw2:/etc/openvpn# tcpdump -ni eth0 host 4.3.2.1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
10:34:43.534596 IP 4.3.2.1.1859 > 1.2.3.4.1859: UDP, length 100
10:34:43.535359 IP 1.2.3.4.1859 > 4.3.2.1.1859: UDP, length 100
10:34:44.468608 IP 4.3.2.1.1859 > 1.2.3.4.1859: UDP, length 100
10:34:44.481441 IP 1.2.3.4.1859 > 4.3.2.1.1859: UDP, length 100
10:34:45.476109 IP 4.3.2.1.1859 > 1.2.3.4.1859: UDP, length 100
10:34:45.476510 IP 1.2.3.4.1859 > 4.3.2.1.1859: UDP, length 60
10:34:45.477085 IP 1.2.3.4.1859 > 4.3.2.1.1859: UDP, length 100
HERE -->10:34:45.496917 IP 1.2.3.4.1194 > 4.3.2.1.1859: UDP, length 60
10:34:45.537356 IP 4.3.2.1.1859 > 1.2.3.4.1859: UDP, length 540
10:34:46.540260 IP 4.3.2.1.1859 > 1.2.3.4.1859: UDP, length 100
10:34:46.540955 IP 1.2.3.4.1859 > 4.3.2.1.1859: UDP, length 100
10:34:47.526090 IP 4.3.2.1.1859 > 1.2.3.4.1859: UDP, length 100
10:34:47.526793 IP 1.2.3.4.1859 > 4.3.2.1.1859: UDP, length 100

It seems, that client sometimes want to reconnect FROM udp 1194 (instead the right 1859) and the other connection, which is already on 1859, left open.
So the server A is sending packets to the 1859 connection (and can ping), but the client change routing to the 1194, which is not initialized and not working (and try to connect from the 1194 socket generate "decryption error" on server A). As I said – there's no other configuration nor instance of openvpn on th client (Server B) than the one i dumped upper.

Could somebody tell me, what could be wrong in my configuration? I'm on the end of my mind.

Thank you.

J+

PS: Sorry for bad english.

Best Answer

My setup isn't exactly the same as described, but the symptoms are the same. The problem in my case was trying to use the same certificate on two separate computers at the same time. It toggled the connection back and forth between the two computers. Once I created a separate certificate, both computers stayed on the VPN solidly.