Vlan – How is IEEE 802.1ad (aka VLAN Tagging, QinQ) valid, when the packets are too large

ethernetmtuqinqvlan

Recently i've been dealing with MTU issues. And it all seems to stem from the fact that the ethernet adapter on newer computers default of a frame size of 1504 bytes:

>netsh interface ipv4 show subinterfaces

   MTU  MediaSenseState   Bytes In  Bytes Out  Interface
------  ---------------  ---------  ---------  -------------
  1504                1  3954161316  804790885  Local Area Connection

Now, according to a random person on NetworkEngineering.stackexchange.com, any packet that is too large will be dropped by any receiving Network Interface Card (NIC), as the ethernet packet is too large:

…any frame with an MTU greater than the 802.3 spec of 1500

A frame larger than the set max will be dropped by the NIC — it's an error, and the OS will never know about it. (an oversized frame counter will click up, but that's all.)

Which causes problems when the computer tries to send packets to the gateway machine. Ideally i would be relying on Path MTU discovery. But since the ethernet packets being generated are too large for any other machine to receive, there is no opportunity for IP Packet too big fragmentation messages to be returned:

There will be no "fragmentation" at all. Layer-2 (ethernet) has no means if indicating "fragmentation needed". This is figured out at Layer-3 (IP) by routers sending an ICMP message when it has to drop the packet because it won't fit on the next-hop interface.

Which brings me to my second question first:

  • Why is there a spec that intentionally creates invalid ethernet frames? What is the intended behaviour here? Given that other network interface cards cannot receive these new default-sized packets, what were they expecting to happen.

This then brings me to my first question second. And this is something that has been asked before – a lot.

  • Is the 4-byte QinQ tag part of the ethernet frame header, or part of the ethernet payload? If it's part of the header, why did the payload body increase by 4-bytes? If it's part of the ethernet payload, why the the payload MTU increase by 4 bytes (when we know that increasing it by 4 bytes makes it an invalid packet)?

The larger question is…

If we step back for a moment, we have the larger question:

What are we supposed to do?

There must have been people who designed this standard. What was it they expected people to do with devices that generate these too large packets?

I'm really asking. I assume we weren't meant to go to every hardware device and undo the increase of the MTU 1504 and revert it to 1500:

netsh interface ipv4>set subinterface "Local Area Connection" mtu=1500 store=persistent
Ok.

that would be (and is being) a configuration nightmare.

Is the idea perhaps to turn off VLAN tagging? Aside from the configuration nightmare, it simply doesn't work:

  • Step 1: disable VLAN tagging

    enter image description here

  • Step 2: observe that it doesn't work:

    netsh interface ipv4 show subinterfaces

     MTU  MediaSenseState   Bytes In  Bytes Out  Interface
    

    1504                1     238125     245855  Local Area Connection
    

If the solution to this is to manually force all network cards back to an MTU of 1500, then why did they bump it up to 1504 bytes, and create invalid packets, in the first place?

There is a piece of the puzzle i am missing.

Bonus Chatter

 Without 802.1Q tagging        Without 802.1Q tagging   
+------------------------+    +------------------------+
|Destination MAC: 6 bytes|    |Destination MAC: 6 bytes|
|Source MAC: 6 bytes     |    |Source MAC: 6 bytes     |
|Ethertype: 2 bytes      |    |802.1Q tag: 4 bytes     |
+------------------------+    |Ethertype: 2 bytes      |
|                        |    +------------------------+
|                        |    |                        |
/ Payload: 1500 bytes    /    / Payload: 1500 bytes    /
|                        |    |                        |
|                        |    |                        |
+------------------------+    |                        |
| Frame Check Sequence:  |    +------------------------+
|                 4 bytes|    | Frame Check Sequence:  |
+------------------------+    |                 4 bytes|
                              +------------------------+

Network Diagram

+------------------+      +----------------+     +------------------+
| Realtek PCIe GBe |      | NetGear 10/100 |     | Realtek 10/100   |
|       (on-board) |      |     Switch     |     |     (on-board)   |
|                  |      +----------------+     |                  |
| Windows 7        |           ^    ^            |                  |
|                  |           |    |            |                  |
| 192.168.1.98/24  |-----------+    +------------| 192.168.1.10/24  |
| MTU = 1504 bytes |                             | MTU = 1500 bytes |
+------------------+                             +------------------+

You could also substitute any configuration you like, generating packets larger than the maximum allowed 1500 bytes:

+------------------+      +----------------+     +------------------+
| Realtek PCIe GBe |      | NetGear 10/100 |     | Realtek 10/100   |
|       (on-board) |      |     Switch     |     |     (on-board)   |
|                  |      +----------------+     |                  |
| Windows 7        |           ^    ^            | MTU = 1500 bytes |
| MTU = 16384bytes |           |    |            |                  |
|                  |-----------+    +------------|                  |
+------------------+                             +------------------+

I'm trying to find a site that might be able to address my technical, conceptual, logical, fundamental, theoretical problem of how Ethernet can work when some devices intentionally generate invalid packets.

The concern comes when i try to send an invalid Ethernet packet to another Ethernet device:

  • computer generates Ethernet packet

    Source MAC:      xx-xx-xx-xx-xx-xx
    Destination MAC: yy-yy-yy-yy-yy-yy
    Ethertype:       0x0800
    Payload:         ...1504 bytes...  (or could be ...16384 bytes, anything larger than 1500...)
    CRC:             4 bytes                
    

This packet is invalid because it is too large to be received by the target 802.3u device. Because the target's host operating system never sees the packet, and because Ethernet has no functionality to report invalid packets back to the sender, the "large" packet is lost.

Bonus Chatter

From Cisco's Inter-Switch Link and IEEE 802.1Q Frame Format:

Frame Size

The default maximum transmission unit (MTU) of an interface is 1500 bytes. With an outer VLAN tag attached to an Ethernet frame, the packet size increases by 4 bytes. Therefore, it is advisable that you appropriately increase the MTU of each interface on the provider network. The recommended minimum MTU is 1504 bytes.

Meanwhile:

The IEEE 802.3 Ethernet standard only mandates support for 1500-byte MTU frames.

Best Answer

Responding to individual concerns in the post...

Regarding Path MTU Discovery

Ideally i would be relying on Path MTU discovery. But since the ethernet packets being generated are too large for any other machine to receive, there is no opportunity for IP Packet too big fragmentation messages to be returned

Based on your diagram, I agree that PMTUD cannot function between two different PCs in the same LAN segment; PCs do not generate ICMP Error messages required by PMTUD.

Jumbo frames

Some vendors (such as Cisco) have switch models which support ethernet payloads larger than 1500 bytes. Officially IEEE does not endorse this configuration, but the industry has valid needs to judiciously deviate from the original 1500 byte MTU. I have storage LAN / backup networks which leverage jumbo frame for good reason; however, I made sure that all MTUs matched inside the same vlan when I deployed jumbo frames.

Mismatched MTUs within a broadcast domain

The bottom line is that you should never have mismatched ethernet MTUs inside the same ethernet broadcast domain; if you do, it's a bug or configuration error. Regardless of bug or error, you have to solve these problems, sometimes manually.

All that discussion leads to the next question...

Why is there a spec that intentionally creates invalid ethernet frames?

I'm not sure that I agree... I don't see how the IEEE 802.3 series, or RFC 894 create invalid frames. Host implementations or host misconfigurations create invalid frames. To understand whether your implementation is following the spec, we need a lot more evidence...

This diagram is at least prima facie evidence that your MTUs are mismatched inside a broadcast domain...

+------------------+      +----------------+     +------------------+
| Realtek PCIe GBe |      | NetGear 10/100 |     | Realtek 10/100   |
|       (on-board) |      |     Switch     |     |     (on-board)   |
|                  |      +----------------+     |                  |
| Windows 7        |           ^    ^            |                  |
|                  |           |    |            |                  |
| 192.168.1.98/24  |-----------+    +------------| 192.168.1.10/24  |
| MTU = 1504 bytes |                             | MTU = 1500 bytes |
+------------------+                             +------------------+

How should an 802.3-compliant implementation respond to MTU mismatches?

What was it they [the writers of 'the spec'] expected people to do with devices that generate these too large packets?

MTU 1504 and MTU 1500 within the same broadcast domain is simply a misconfiguration; it should never be expected to work any more than mismatched IP netmasks, or mismatched IP subnets can be expected to work. Your company will have to knuckle-down and fix the root-cause of the MTU mismatches... at this time it's hard to say whether the root cause is user error, an implementation bug, or some combination of the above.

If the affected Windows machines are successfully logging into to an Active Directory Domain, one could write Windows login scripts to automatically fix MTU issues based on some well-constructed tests inside the domain login scripts (assuming the Domain Controller isn't part of the MTU issues).

If the machines are not logging into a domain, manual labor is another option.

Other possibilities to contain the damage

Use a layer3 switchNote 1 to build a custom vlan for anything that has broken MTUs and set the layer3 switch's ethernet MTU to match the broken machines; this relies on PMTUD to resolve MTU issues at the IP layer. Layer3 switches generate the ICMP errors required by PMTUD.

This option works best if you can re-address the broken machines with DHCP; and you can identify the broken machines by mac-address.

... why did they bump it up to 1504 bytes, and create invalid packets, in the first place?

Hard to say at this point

802.1ad vs 802.1q

How is IEEE 802.1ad (aka VLAN Tagging, QinQ) valid, when the packets are too large?

I haven't seen evidence so far that you're using QinQ; from the limited evidence I have seen so far, you're using simple 802.1q encapsulation, which should work correctly in Windows, assuming the NIC driver supports 802.1q encap.


End Notes:

Note 1Any layer 3 switch should do... Cisco, Juniper, and Brocades all could perform this kind of function.