Linux MTU and UDP

linuxmtu

Can someone please explain this behavior to me? I have a few VMS (centos) running on a cloud provider. The interface is set to the standard 1500 MTU.

pinging with large ICMP packets works fine:

# ping -s 1600 10.132.6.3
PING 10.132.6.3 (10.132.6.3) 1600(1628) bytes of data.
1608 bytes from 10.132.6.3: icmp_seq=1 ttl=64 time=1.16 ms
1608 bytes from 10.132.6.3: icmp_seq=2 ttl=64 time=1.09 ms
1608 bytes from 10.132.6.3: icmp_seq=3 ttl=64 time=1.04 ms
^C
--- 10.132.6.3 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2207ms
rtt min/avg/max/mdev = 1.044/1.101/1.168/0.063 ms

So it seems something is fragmenting the ICMP traffic.

But large UDP traffic does not make it:

]# nping --udp -p 111 -data-length 1600 10.132.6.3
WARNING: Payload exceeds maximum recommended payload (1400)

Starting Nping 0.5.51 ( http://nmap.org/nping ) at 2015-08-10 18:06 EDT
sendto in send_ip_packet_sd: sendto(3, packet, 1628, 0, 10.132.40.29, 16)   => Message too long
Offending packet: UDP 10.132.6.3:53 > 10.132.40.29:111 ttl=64 id=17499   iplen=1628
SENT (0.0082s) UDP 10.132.6.3:53 > 10.132.40.29:111 ttl=64 id=17499 iplen=1628
sendto in send_ip_packet_sd: sendto(3, packet, 1628, 0, 10.132.40.29, 16) => Message too long
Offending packet: UDP 10.132.6.3:53 > 10.132.40.29:111 ttl=64 id=17499  iplen=1628
SENT (1.0086s) UDP10.132.6.3:53 > 10.132.40.29:111 ttl=64 id=17499  iplen=1628
sendto in send_ip_packet_sd: sendto(3, packet, 1628, 0, 10.132.40.29, 16) => Message too long
Offending packet: UDP 10.132.6.3:53 > 10.132.40.29:111 ttl=64 id=17499 iplen=1628
SENT (2.0097s) UDP 10.132.6.3:53 > 10.132.40.29:111 ttl=64 id=17499 iplen=1628

Max rtt: N/A | Min rtt: N/A | Avg rtt: N/A
Raw packets sent: 3 (4.884KB) | Rcvd: 0 (0B) | Lost: 3 (100.00%)
Tx time: 2.34513s | Tx bytes/s: 2082.61 | Tx pkts/s: 1.28
Rx time: 2.34513s | Rx bytes/s: 0.00 | Rx pkts/s: 0.00
Nping done: 1 IP address pinged in 2.35 seconds

Any thoughts as to why the UDP traffic is not being fragmented?

Best Answer

The sendto error is coming from nping, which is receiving it back from the OS socket library (i.e., locally - not from somewhere on the network). So nping is just trying to send 1600 byte UDP packets, but the OS can't send them.

OTOH, if you use the --mtu option to nping, it will fragment the packets. It's apparently not counting the IP header in its MTU, because the largest I can set the MTU is 1480.

nping --udp -p 111 -data-length 1600 --mtu 1480 some-host
WARNING: Payload exceeds maximum recommended payload (1400)

Starting Nping 0.5.51 ( http://nmap.org/nping ) at 2015-08-11 10:29 EDT
SENT (0.0056s) UDP 192.168.1.40:53 > 192.168.1.14:111 ttl=64 id=58221 iplen=1628
RCVD (0.0068s) ICMP 192.168.1.14 > 192.168.1.40 Destination host 192.168.1.14 administratively prohibited (type=3/code=10) ttl=64 id=33478 iplen=576

OTOH, ping must be fragmenting the packets before giving them to the OS.

A good technique for investigating this sort of thing is to use tcpdump to sniff what is actually happening on the network.

tcpdump -s0 -w /tmp/tcpdump.out host 192.168.1.1

you can then download tcpdump.out and inspect its contents with wireshark.

if you omit -s0 it will only capture the first 64 (i think) bytes of each packet. For this case that would be plenty.