Ssh – How to prevent SSH freezes over an openvpn client to client connection

mtuopenvpnscpssh

I have the exact same issue as the one described here, but I cannot request clarification from the author, since I am a new user and I cannot post a comment on that, so I am posting a new question (I tried posting this under that as an answer for reference on the same thread, and it was deleted since it doesn't provide an answer…).

How do I prevent TCP connection freezes over an OpenVPN network?

Question: Does anyone have any recommendations for how to troubleshoot and/or determine the root cause of the TCP issue described on that thread? It's as if the remote end isn't accepting the ACK messages sent by the VPN client.

My setup is exactly the same as in the original qustion: CentOS server (topology subnet), and two clients, one CentOS and one Ubuntu14.03. When I do an 'ssh cat abc.txt' from the ubuntu-client to the centos-client the vpn connection of the centos stalls. Only way to get it back up is to restart both the openvpn server (on a centos box) and the openvpn client on the centos – just restarting the centos-client connection doesn't make it operational (it will bring up the tun0 after ~1-2 minutes, but I cannot ping or ssh the box via vpn anymore). I also tried all the MTU adjustment suggestions found in other threads (tun-mtu 1300 / fragment 1100 / mssfix etc) and none of them helps.

What makes this even more weird, is that if I do the same ssh-cat from Ubuntu, using the CentOS server vpn for internet to the public ip address of the centos-client (thus bypassing the centos-client<->centos-server vpn leg), everything works fine (no stalls, ever).

UPDATE 1: I found is a workaround to fix this, but it is a very ugly one. Posting it here, in case some people come up with any other ideas/hints. When I set the verbose level to 9 on the openvpn server (not on the client, server only), the issue never occurs again. Verb 9 causes the openvpn server to log lots of data, and use up 100% of the CPU it is running on. This then limits the transfer speed and makes the scp complete successfully with no stalls; scp now copies with 40-50Kb/sec, while before it was stalling after hitting above 100Kb/sec.

UPDATE 2: I believe this is a buffering problem. The size of the file transferred (via scp or ssh cat) matters, a lot. If I scp a 700KB file (or smaller), it will always succeed, no matter how many times I try it. If I try for an 800KB file instead, it will always fail/stall after 7xxKb+.

Best Answer

I've seen similar issue and been able to work around them by disabling TCP window scaling.

sysctl -w net.ipv4.tcp_window_scaling=0

Maybe this will point you in the right direction of where the problem may be.