If I have ComputerA that has their networking interface set with an MTU of 1500 and ComputerB has a network interface with an MTU of 1200 is there any way for ComputerA to know that ComputerB's MTU is 1200 or if it sends large packets it will get from ComputerB a request to fragment the packet?
How do both sides know the MTU
mtu
Related Solutions
Responding to individual concerns in the post...
Regarding Path MTU Discovery
Ideally i would be relying on Path MTU discovery. But since the ethernet packets being generated are too large for any other machine to receive, there is no opportunity for IP Packet too big fragmentation messages to be returned
Based on your diagram, I agree that PMTUD cannot function between two different PCs in the same LAN segment; PCs do not generate ICMP Error messages required by PMTUD.
Jumbo frames
Some vendors (such as Cisco) have switch models which support ethernet payloads larger than 1500 bytes. Officially IEEE does not endorse this configuration, but the industry has valid needs to judiciously deviate from the original 1500 byte MTU. I have storage LAN / backup networks which leverage jumbo frame for good reason; however, I made sure that all MTUs matched inside the same vlan when I deployed jumbo frames.
Mismatched MTUs within a broadcast domain
The bottom line is that you should never have mismatched ethernet MTUs inside the same ethernet broadcast domain; if you do, it's a bug or configuration error. Regardless of bug or error, you have to solve these problems, sometimes manually.
All that discussion leads to the next question...
Why is there a spec that intentionally creates invalid ethernet frames?
I'm not sure that I agree... I don't see how the IEEE 802.3 series, or RFC 894 create invalid frames. Host implementations or host misconfigurations create invalid frames. To understand whether your implementation is following the spec, we need a lot more evidence...
This diagram is at least prima facie evidence that your MTUs are mismatched inside a broadcast domain...
+------------------+ +----------------+ +------------------+
| Realtek PCIe GBe | | NetGear 10/100 | | Realtek 10/100 |
| (on-board) | | Switch | | (on-board) |
| | +----------------+ | |
| Windows 7 | ^ ^ | |
| | | | | |
| 192.168.1.98/24 |-----------+ +------------| 192.168.1.10/24 |
| MTU = 1504 bytes | | MTU = 1500 bytes |
+------------------+ +------------------+
How should an 802.3-compliant implementation respond to MTU mismatches?
What was it they [the writers of 'the spec'] expected people to do with devices that generate these too large packets?
MTU 1504 and MTU 1500 within the same broadcast domain is simply a misconfiguration; it should never be expected to work any more than mismatched IP netmasks, or mismatched IP subnets can be expected to work. Your company will have to knuckle-down and fix the root-cause of the MTU mismatches... at this time it's hard to say whether the root cause is user error, an implementation bug, or some combination of the above.
If the affected Windows machines are successfully logging into to an Active Directory Domain, one could write Windows login scripts to automatically fix MTU issues based on some well-constructed tests inside the domain login scripts (assuming the Domain Controller isn't part of the MTU issues).
If the machines are not logging into a domain, manual labor is another option.
Other possibilities to contain the damage
Use a layer3 switchNote 1 to build a custom vlan for anything that has broken MTUs and set the layer3 switch's ethernet MTU to match the broken machines; this relies on PMTUD to resolve MTU issues at the IP layer. Layer3 switches generate the ICMP errors required by PMTUD.
This option works best if you can re-address the broken machines with DHCP; and you can identify the broken machines by mac-address.
... why did they bump it up to 1504 bytes, and create invalid packets, in the first place?
Hard to say at this point
802.1ad vs 802.1q
How is IEEE 802.1ad (aka VLAN Tagging, QinQ) valid, when the packets are too large?
I haven't seen evidence so far that you're using QinQ; from the limited evidence I have seen so far, you're using simple 802.1q encapsulation, which should work correctly in Windows, assuming the NIC driver supports 802.1q encap.
End Notes:
Note 1Any layer 3 switch should do... Cisco, Juniper, and Brocades all could perform this kind of function.
The answer is simple: whenever the host pleases. Really. It's that simple.
The explanation below assumes an IPv4-only environment, since IPv6 does away with fragmentation in the routers (forcing the host to always deal with fragmentation and MTU discovery).
There is no strict rule that governs when (or even if) a host does Path MTU Discovery. The reason that PMTUD surfaced is that fragmentation is considered harmful for various reasons. To avoid packet fragmentation, the concept of PMTUD was brought to life as a workaround. Of course, a nice operating system should use PMTUD to minimize fragmentation.
So, naturally, the exact semantics of when PMTUD is used depend on the sender's operating system - in particular, the socket implementation. I can only speak for the specific case of Linux, but other UNIX variants are probably not very different.
In Linux, PMTUD is controlled by the IP_MTU_DISCOVER
socket option. You can retrieve its current status with getsockopt(2)
by specifying the level IPPROTO_IP
and the IP_MTU_DISCOVER
option. This option is valid for SOCK_STREAM
sockets only (a SOCK_STREAM
socket is a two-way, connection-oriented, reliable socket; in practice it's a TCP socket, although other protocols are possible), and when set, Linux will perform PMTUD exactly as defined in RFC 1191.
Note that in practice, PMTUD is a continuous process; packets are sent with the DF bit set - including the 3-way handshake packets - you can think of it as a connection property (although an implementation may be willing to accept a certain degree of fragmentation at some point and stop sending packets with the DF bit set). Thus, PMTUD is just a consequence of the fact that everything on that connection is being sent with DF.
What if you don't set IP_MTU_DISCOVER
?
There's a default value. By default, IP_MTU_DISCOVER
is enabled on SOCK_STREAM
sockets. This can be read or changed by reading /proc/sys/net/ipv4/ip_no_pmtu_disc
. A zero value means that IP_MTU_DISCOVER
is enabled by default in new sockets; a non-zero means the opposite.
What about connectionless sockets?
This is tricky because connectionless, unreliable sockets do not retransmit lost segments. It becomes the user's responsibility to packetize the data in MTU-sized chunks. Also, the user is expected to make the necessary retransmits in case of a Message too big error. So, essentially user code must reimplement PMTUD. Nevertheless, if you're up for the challenge, you can force the DF bit by passing the IP_PMTUDISC_DO
flag to setsockopt(2)
.
The bottomline
- The host decides when (and if) to use PMTUD
- When it uses PMTUD, it's like a connection attribute, it happens continuously (but at any point the implementation is free to stop doing so)
- Different operating systems use different approaches, but usually, reliable, connection-oriented sockets perform PMTUD by default, whereas unreliable, connectionless sockets don't
Best Answer
The hosts can use Path MTU Discovery which is optional for IPv4 and mandatory for IPv6.
With IPv4, a packet is fragmented where it is larger than the egress interface's MTU - that can happen on the source host or any intermediate router. In your example, the router between A and B would fragment the packet.
Fragmentation can put considerable load on routers and was removed for IPv6 routers, so a v6 host needs to discover the path MTU for each destination or fragment oversized packets by itself. If an IPv6 router encounters an oversized packets it returns an ICMP error and drops it.
The MTU is a property of each interface. It is either (hardware) default, configured statically, or by DHCP (option 26). All nodes in a shared L2 segment need to use the same maximum frame size, resulting in the same MTU. If a node sends larger frames than the others can except these are dropped as oversized.