Let's do some path MTU discovery between two Debian hosts separated by a Debian router that runs Shorewall-generated iptables rules. Each of the two hosts uses a single Ethernet link while the router uses tagged VLANs over two aggregated Ethernet links.
Using scamper :
root@kitandara:/home/jm# scamper -I "trace -M 10.64.0.2"
traceroute from 10.1.0.5 to 10.64.0.2
1 10.1.0.1 0.180 ms [mtu: 6128]
2 10.64.0.2 0.243 ms [mtu: 6128]
Good: 6128 bytes is the expected result (cheap Realtek Ethernet adapters can't handle jumbo frames of a decent size).
Now, let iperf perform a throughput test and tell us about the MTU by the way :
root@kitandara:/home/jm# iperf -c 10.64.0.2 -N -m
------------------------------------------------------------
Client connecting to 10.64.0.2, TCP port 5001
TCP window size: 66.2 KByte (default)
------------------------------------------------------------
[ 3] local 10.1.0.5 port 59828 connected with 10.64.0.2 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1011 MBytes 848 Mbits/sec
[ 3] MSS size 6076 bytes (MTU 6116 bytes, unknown interface)
6116 bytes ? Why ?
And now for something completely different, let's see what this session's traffic actually contained :
root@kitandara:/home/jm# tshark -i eth0 -R "(ip.dst == 10.64.0.2) || (ip.src == 10.64.0.2)" | head
Capturing on eth0
1.308557 10.1.0.5 -> 10.64.0.2 TCP 74 60310 > 5001 [SYN] Seq=0 Win=5340 Len=0 MSS=534 SACK_PERM=1 TSval=101928961 TSecr=0 WS=16
1.308801 10.64.0.2 -> 10.1.0.5 TCP 74 5001 > 60310 [SYN, ACK] Seq=0 Ack=1 Win=18328 Len=0 MSS=6088 SACK_PERM=1 TSval=3764064056 TSecr=101928961 WS=64
6088 bytes MSS, which means a 6128 MTU… Good. But then why does iperf announce a 6116 bytes MTU ?
At that point thoroughness calls for a closer look at what happens during the scamper trace session :
root@kitandara:/home/jm# tshark -i eth0 -R "(ip.dst == 10.64.0.2) || (ip.src == 10.64.0.2)"
Capturing on eth0
0.000000 10.1.0.5 -> 10.64.0.2 UDP 58 Source port: 43870 Destination port: 33435
0.000175 10.1.0.1 -> 10.1.0.5 ICMP 86 Time-to-live exceeded (Time to live exceeded in transit)
0.050358 10.1.0.5 -> 10.64.0.2 UDP 58 Source port: 43870 Destination port: 33436
0.050592 10.64.0.2 -> 10.1.0.5 ICMP 86 Destination unreachable (Port unreachable)
0.099790 10.1.0.5 -> 10.64.0.2 UDP 6142 Source port: 43870 Destination port: 33437
0.100912 10.64.0.2 -> 10.1.0.5 ICMP 590 Destination unreachable (Port unreachable)
All those packets have a udp.length of 24 except the two last which have a udp.length of 6108… But then how does scamper tell us that the path MTU is 6128 ?
6108, 6116, 6128… So many MTU to choose from !
Best Answer
Very interesting.
MSS (maximum segment size) = MTU - IP header = 6076.
6076 + 40 = 6116.
Could it be Debian is using the IP options fields in the IP header? That might be the extra 12 bytes...