Packetloss over Internet for “Linux-Linux” but not for “Windows-Linux” (tl;dr: it’s MTU)

mtunetworkingpacketloss

I am right now getting additional grey hair fighting a phenomenon concerning packet loss between machines on the Internet.

Check the diagram below. Note that whenever I use "SSH" I could use "HTTPS"; the same phenomenon occurs for that protocol.

A SSH server running Fedora 22 is on "Site A" (wine red). I never had any connection problems till "recently".

SSH connections to "Site A" from Amazon EC2 machines running Fedora 22 or Fedora 23 work perfectly well (hosts shown in green inside the "Amazon EC2" box)

SSH connections to "Site A" from "Site B", which is on the same AS, do not work from any Fedora system I tested (orange boxes). However they do work from a Windows 7 system using Putty. The same (dual-boot) hardware is involved in both cases. "Site B" also has a firewall but that does not seem to play any role: I have tried to set up the connection directly from the FritzBox router and it still didn't work for Fedora but worked for Windows.

How does the problem manifest itself:

When you connect using SSH, there is an initial packet exchange going on (as shown by tcpdump). However, after 20 packets or so, the outgoing packets seem to not go anywhere anymore; no acknowledgements come back from Site A. You never get to the password prompt. A CTRL-C properly resets the connection, after which Linux still tries to send the packets that were never ACKed for a bit.

The setup

I suspect there is some problem at my ISP, in particular I suspect that the ISP performs suspect magic in order to implement the "fixed IP address" at Site B, which is the only thing that changed "recently".

However, I can't understand what would account for the fact an SSH connection works from Windows but not from Linux under the same conditions, network-wise. What should I be looking for?

Best Answer

Your packet trace shows:

22:29:22.180852 IP (tos 0x0, ttl 64, id 52989, offset 0, flags [DF], proto TCP (6), length 1900)   
SITE_B_LAN_ADDR.54358 > SITE_A.SSH_PORT: Flags [P.], cksum 0x05c4 (incorrect -> 0xadce), seq 22:1870, ack 22, win 229, options [nop,nop,TS val 4294917498 ecr 71539420], length 1848

Note its a 1900 sized byte length with a dont fragment option set on the packet. Typical MTUs tend to be between 1400-1500 bytes.

Your probably getting packet too big ICMP messages back but your dropping all ICMP traffic inbound at the site A firewall.

To test for this you'd have to do the packet trace on your firewall for icmp and tcp 22.

Make sure you permit ICMP packet too big messages inbound at site A.

Alternatively you could try setting the MTU on your Linux boxes at Site A to something under the size of your network MTU. I am hazarding a guess that on Fedora you have jumbo packets enabled but on Windows you do not.

Related Topic