Exactly when is PMTUD performed? (Path MTU discovery)

ipv4mtu

In discussions that have spurred from other questions on this site, I've realised that I don't have a solid understanding of when Path MTU Discovery (PMTUD) is performed.

I know what it does — discover the lowest MTU on a path from Client to Server).
I know how it does it — send progressively larger packets with their "Don't Fragment" bit set, and see how big of a packet you can get through without getting a "ICMP Need to Fragment" error.

My question is specifically then, when will a host perform PMTUD?

I'm looking for specific cases. Not just something generic like "when a host wants to discover the path MTU". Bonus points if you can provide a packet capture of a host doing it, or provide instructions for generating such a packet capture.

Also, I am specifically referring to IPv4. I know in IPv6 transient routers aren't responsible for fragmentation, and can imagine that PMTUD happens much more commonly. But for now, I'm looking for specific examples of PMTUD in IPv4. (although if the only packet capture you can put together of PMTUD is in IPv6, I would still love to see it)

Best Answer

The answer is simple: whenever the host pleases. Really. It's that simple.

The explanation below assumes an IPv4-only environment, since IPv6 does away with fragmentation in the routers (forcing the host to always deal with fragmentation and MTU discovery).

There is no strict rule that governs when (or even if) a host does Path MTU Discovery. The reason that PMTUD surfaced is that fragmentation is considered harmful for various reasons. To avoid packet fragmentation, the concept of PMTUD was brought to life as a workaround. Of course, a nice operating system should use PMTUD to minimize fragmentation.

So, naturally, the exact semantics of when PMTUD is used depend on the sender's operating system - in particular, the socket implementation. I can only speak for the specific case of Linux, but other UNIX variants are probably not very different.

In Linux, PMTUD is controlled by the IP_MTU_DISCOVER socket option. You can retrieve its current status with getsockopt(2) by specifying the level IPPROTO_IP and the IP_MTU_DISCOVER option. This option is valid for SOCK_STREAM sockets only (a SOCK_STREAM socket is a two-way, connection-oriented, reliable socket; in practice it's a TCP socket, although other protocols are possible), and when set, Linux will perform PMTUD exactly as defined in RFC 1191.

Note that in practice, PMTUD is a continuous process; packets are sent with the DF bit set - including the 3-way handshake packets - you can think of it as a connection property (although an implementation may be willing to accept a certain degree of fragmentation at some point and stop sending packets with the DF bit set). Thus, PMTUD is just a consequence of the fact that everything on that connection is being sent with DF.

What if you don't set IP_MTU_DISCOVER?

There's a default value. By default, IP_MTU_DISCOVER is enabled on SOCK_STREAM sockets. This can be read or changed by reading /proc/sys/net/ipv4/ip_no_pmtu_disc. A zero value means that IP_MTU_DISCOVER is enabled by default in new sockets; a non-zero means the opposite.

What about connectionless sockets?

This is tricky because connectionless, unreliable sockets do not retransmit lost segments. It becomes the user's responsibility to packetize the data in MTU-sized chunks. Also, the user is expected to make the necessary retransmits in case of a Message too big error. So, essentially user code must reimplement PMTUD. Nevertheless, if you're up for the challenge, you can force the DF bit by passing the IP_PMTUDISC_DO flag to setsockopt(2).

The bottomline

The host decides when (and if) to use PMTUD
When it uses PMTUD, it's like a connection attribute, it happens continuously (but at any point the implementation is free to stop doing so)
Different operating systems use different approaches, but usually, reliable, connection-oriented sockets perform PMTUD by default, whereas unreliable, connectionless sockets don't

Best Answer

Related Solutions

Routing – Why don’t iperf, scamper and path MTU discovery packet captures agree on the path’s MTU

IPv4 Address Space Planning Best Practices

Related Topic