Routing – Why don’t iperf, scamper and path MTU discovery packet captures agree on the path’s MTU

ipv4routing

Let's do some path MTU discovery between two Debian hosts separated by a Debian router that runs Shorewall-generated iptables rules. Each of the two hosts uses a single Ethernet link while the router uses tagged VLANs over two aggregated Ethernet links.

Using scamper :

root@kitandara:/home/jm# scamper -I "trace -M 10.64.0.2"
traceroute from 10.1.0.5 to 10.64.0.2
 1  10.1.0.1  0.180 ms [mtu: 6128]
 2  10.64.0.2  0.243 ms [mtu: 6128]

Good: 6128 bytes is the expected result (cheap Realtek Ethernet adapters can't handle jumbo frames of a decent size).

Now, let iperf perform a throughput test and tell us about the MTU by the way :

root@kitandara:/home/jm# iperf -c 10.64.0.2 -N -m
------------------------------------------------------------
Client connecting to 10.64.0.2, TCP port 5001
TCP window size: 66.2 KByte (default)
------------------------------------------------------------
[  3] local 10.1.0.5 port 59828 connected with 10.64.0.2 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1011 MBytes   848 Mbits/sec
[  3] MSS size 6076 bytes (MTU 6116 bytes, unknown interface)

6116 bytes ? Why ?

And now for something completely different, let's see what this session's traffic actually contained :

root@kitandara:/home/jm# tshark -i eth0 -R "(ip.dst == 10.64.0.2) || (ip.src == 10.64.0.2)" | head
Capturing on eth0
  1.308557     10.1.0.5 -> 10.64.0.2    TCP 74 60310 > 5001 [SYN] Seq=0 Win=5340 Len=0 MSS=534 SACK_PERM=1 TSval=101928961 TSecr=0 WS=16
  1.308801    10.64.0.2 -> 10.1.0.5     TCP 74 5001 > 60310 [SYN, ACK] Seq=0 Ack=1 Win=18328 Len=0 MSS=6088 SACK_PERM=1 TSval=3764064056 TSecr=101928961 WS=64

6088 bytes MSS, which means a 6128 MTU… Good. But then why does iperf announce a 6116 bytes MTU ?

At that point thoroughness calls for a closer look at what happens during the scamper trace session :

root@kitandara:/home/jm# tshark -i eth0 -R "(ip.dst == 10.64.0.2) || (ip.src == 10.64.0.2)"
Capturing on eth0
  0.000000     10.1.0.5 -> 10.64.0.2    UDP 58 Source port: 43870  Destination port: 33435
  0.000175     10.1.0.1 -> 10.1.0.5     ICMP 86 Time-to-live exceeded (Time to live exceeded in transit)
  0.050358     10.1.0.5 -> 10.64.0.2    UDP 58 Source port: 43870  Destination port: 33436
  0.050592    10.64.0.2 -> 10.1.0.5     ICMP 86 Destination unreachable (Port unreachable)
  0.099790     10.1.0.5 -> 10.64.0.2    UDP 6142 Source port: 43870  Destination port: 33437
  0.100912    10.64.0.2 -> 10.1.0.5     ICMP 590 Destination unreachable (Port unreachable)

All those packets have a udp.length of 24 except the two last which have a udp.length of 6108… But then how does scamper tell us that the path MTU is 6128 ?

6108, 6116, 6128… So many MTU to choose from !

Best Answer

Very interesting.

MSS (maximum segment size) = MTU - IP header = 6076.

6076 + 40 = 6116.

Could it be Debian is using the IP options fields in the IP header? That might be the extra 12 bytes...