Linux – Workarounds for probable TIME_WAIT issue preventing reestablishing broken SSH tunnels

While not intending to cross-post, after sending this question to the OpenSSH list on SecurityFocus, I noticed that the list is rather low traffic (the previous post was approximately 5 months prior). That being said, I decided to repost here as the issue will probably get more eyeballs (and will stand a better chance of being of use to others if answered):

The issue: I have a reverse SSH tunnel from an internal machine to a host in my DMZ which is set to launch at system boot and relaunch if the tunnel fails. However, when the tunnel is interrupted (for example, due to a network outage), it cannot reestablish due to the port on the DMZ host being in use. From my read of the OpenSSH mailing list archives and elsewhere, this appears to be because the port is in a TIME_WAIT state. This is fine: I can put in a sleep statement in the script that sets up the tunnel. However, this leads to two questions:

1) Any idea how to determine what the TIME_WAIT interval is defined as on a particular Linux (or other) system? While I could just sleep for 5 minutes and be fine, it would be good to shave off as much time as possible.

2) While OpenSSH doesn't appear to support the "ClearAllForwardings" option, is there similar functionality whereby an auth'd connection can automatically teardown and recreate an existing connection it had previously established?

A long sleep would probably be "good enough" but I'd prefer to handle the TIME_WAIT condition in a more efficient manner if possible.

I appreciate any guidance or suggestions!

Best Answer

I would think you could play with various SSH settings such as TCPKeepAlive, ServerAliveInterval, ServerAliveCountMax, etc to set up where if the connection goes down it will kill everything. I have a similar setup and I have made a lot of modifications to both SSHD and SSH on both sides to match up with what I want. Then I have a cron job that runs every 5 min that restarts the tunnel if I need to.

#!/bin/bash
if ps aux | grep "ssh -fnNTx" | grep -v "grep"
then
echo "Already Running"
else
echo "Starting now"
ssh -fnNTx -L 1514:127.0.0.1:514 user1@X.X.X.X
fi

So far this solution has worked fine for me. You can also set up some type of Nagios check or another script to see if the tunnel is open and if not, do a kill of that pid so it can restart.

Edit:

Previous article that talks a lot about TIME_WAIT issues. How to forcibly close a socket in TIME_WAIT?

Best Answer

Related Solutions

Windows – Cgywin – issue setting /var permissions for ssh

SSH issue – Disconnecting: Received data for nonexistent channel 0

Related Topic