Linux – Why is SSH/SFTP failing for commands with larger returns

linuxnetworkingsftpssh

We have a SFTP server which was working fine until we added another ISP. The connection to the SFTP server is not going through the new ISP, I confirmed it with tracert. No change was made on the server either. But since then, some users' SFTP or SSH connections time out/hang if the executed command has a larger return. Here's the scenario:

  1. I can continue to ping and the ping will always return even when
    SSH/SFTP times out
  2. I can connect to the server, it asks for authentication and lets me
    log in.
  3. If the ls command for my root directory is returning a small
    number of files or folders, then it shows the listing of files and
    folders
  4. If the ls command for my root directory is larger than let's say 5
    or 6 files or folders, then it hangs/times out.
  5. While trying this, I tried running a ping to the server, and it's
    returning all the time.
  6. This doesn't happen to everyone, but it seems to happen to users who are in another city..

  7. I tried different SFTP clients (FileZilla and WinSCP). Both have
    the same issue.

I ran WireShark on my PC (which is outside of our network and outside of the city), when SFTP/SSH times out, I see retransmission and part of segment not captured errors coming up, which leads me to believe there might be some packet loss somewhere between the hops.

Expert Info (Note/Sequence): Retransmission (suspected)
Previous segment not captured (common at capture start)

Is SFTP/SSH that sensitive to packet loss? Wouldn't SSH/SFTP retransmit/reacknowledge to avoid these packet loss errors? Is there something on the server settings I can tweak in order to make this work?

Best Answer

I believe the commenter hit the nail on the head, this is a classic example of an MTU issue (and path MTU not properly detecting the smaller MTU necessary for that particular path that is failing). You should check that the intermediary devices in the path that is failing are properly allowing MTU path discovery packets through, and that there aren't any intermediary routers with unnecessarily small MTUs. You can probably narrow it down by sending large icmp packets up to the size of MTU to each hop along the way to discover where it is failing (although this doesn't always work).