Ip – Why maximum length of IP, TCP, UDP packet is not suit

ethernetipmtutcpudp

From many tutorials, I get follow knowledge(maybe I'm misunderstood) :

maximum of an Ethernet packet is about 1500 bytes.
maximum of an IP packet is about 65535 bytes.
maximum of a UDP packet is 65515 bytes

But when I made a test and watch Wireshark, I get a different answer.

I try to send some big data with TCP protocol.


    Socket con = new Socket("localhost", 8088);
    OutputStream os = con.getOutputStream();

    StringBuilder s = new StringBuilder();
    for( int i = 0; i < 10000; i++) {
      s.append("Hello world");
    }
    // about 110k bytes
    byte[] data = s.toString().getBytes();

    os.write(data);
    os.close();
    con.close();

This is my Java code (it's not necessary to understand this.), I try to send 110k bytes data with a TCP connection. This is my Wireshark.

My 110k bytes message is split to 7 packets, I think this shows that maximum length of a TCP packet is 16388 bytes.

Then, I try to send a UDP packet:

    DatagramSocket client = new DatagramSocket(50555);

    StringBuilder s = new StringBuilder();
    for( int i = 0; i < 10000; i++) {
      s.append("Hello world");
    }
    // 110k bytes
    byte[] data = s.toString().getBytes();

    int messageLength = data.length;
    for (; ; messageLength--){
      try{
        DatagramPacket packet = new DatagramPacket(data, messageLength, 
          new InetSocketAddress("localhost", 8088));
        // If packet is still too lang, above line will throws an exception

        // If there is not any exception, means we can send this packet
        // and this messageLength is the limit value for a UDP packet.
        client.send(packet);
        System.out.println("message length is " + messageLength);

        // break for loop
        break;
      } catch(Exception e){
        // fail to send and continue for loop
      }
    }
    client.close();

the result ismessage length is 65507.

I was really confused:

IP protocol build on Ethernet or something, Why an IP packet can be 65535 bytes when Ethernet can only send 1500 bytes?
Why an TCP packet is only 16388 bytes?

Then I have read many post on SOF or other websites, But I don't get an answer, I think is not duplicate to others.

Best Answer

IP protocol build on Ethernet or something, Why an IP packet can be 65535 bytes when Ethernet can only send 1500 bytes?

Ethernet is one of several physical layers which can be used to to transport IP (and also protocols besides IP). The size of the packet a physical layer can transport is specific to this physical layer, other physical layers have other properties. Note that in your specific case of communicating on localhost (127.0.0.1) no physical layer (and thus no ethernet) is involved at all since localhost is just a logical but not a physical network interface.

The 65535 bytes limit in IP is because the length field in the IP header is only 16 bit.

Why an TCP packet is only 16388 bytes?

The applications sends data to the OS kernel which then packetizes the data. The size of the data which can be send to the kernel at once depend on the size of the socket buffer which results in the packet size you see. Additionally the kernel might further split the data to best fit the maximum size of the underlying physical layer (like ethernet). With TCP the kernel might also decide to join small data so that they get transmitted together.

Note that for TCP the packet length on the wire is actually irrelevant for the application since TCP is a continuous data stream. With UDP this would be different, i.e. every send would result in a single IP message and would also be received as such single message by the recipient.

Regarding Path MTU Discovery

Ideally i would be relying on Path MTU discovery. But since the ethernet packets being generated are too large for any other machine to receive, there is no opportunity for IP Packet too big fragmentation messages to be returned

Based on your diagram, I agree that PMTUD cannot function between two different PCs in the same LAN segment; PCs do not generate ICMP Error messages required by PMTUD.

Jumbo frames

Some vendors (such as Cisco) have switch models which support ethernet payloads larger than 1500 bytes. Officially IEEE does not endorse this configuration, but the industry has valid needs to judiciously deviate from the original 1500 byte MTU. I have storage LAN / backup networks which leverage jumbo frame for good reason; however, I made sure that all MTUs matched inside the same vlan when I deployed jumbo frames.

Mismatched MTUs within a broadcast domain

The bottom line is that you should never have mismatched ethernet MTUs inside the same ethernet broadcast domain; if you do, it's a bug or configuration error. Regardless of bug or error, you have to solve these problems, sometimes manually.

All that discussion leads to the next question...

Why is there a spec that intentionally creates invalid ethernet frames?

I'm not sure that I agree... I don't see how the IEEE 802.3 series, or RFC 894 create invalid frames. Host implementations or host misconfigurations create invalid frames. To understand whether your implementation is following the spec, we need a lot more evidence...

This diagram is at least prima facie evidence that your MTUs are mismatched inside a broadcast domain...

+------------------+      +----------------+     +------------------+
| Realtek PCIe GBe |      | NetGear 10/100 |     | Realtek 10/100   |
|       (on-board) |      |     Switch     |     |     (on-board)   |
|                  |      +----------------+     |                  |
| Windows 7        |           ^    ^            |                  |
|                  |           |    |            |                  |
| 192.168.1.98/24  |-----------+    +------------| 192.168.1.10/24  |
| MTU = 1504 bytes |                             | MTU = 1500 bytes |
+------------------+                             +------------------+

How should an 802.3-compliant implementation respond to MTU mismatches?

What was it they [the writers of 'the spec'] expected people to do with devices that generate these too large packets?

MTU 1504 and MTU 1500 within the same broadcast domain is simply a misconfiguration; it should never be expected to work any more than mismatched IP netmasks, or mismatched IP subnets can be expected to work. Your company will have to knuckle-down and fix the root-cause of the MTU mismatches... at this time it's hard to say whether the root cause is user error, an implementation bug, or some combination of the above.

If the affected Windows machines are successfully logging into to an Active Directory Domain, one could write Windows login scripts to automatically fix MTU issues based on some well-constructed tests inside the domain login scripts (assuming the Domain Controller isn't part of the MTU issues).

If the machines are not logging into a domain, manual labor is another option.

Other possibilities to contain the damage

Use a layer3 switch^{Note 1} to build a custom vlan for anything that has broken MTUs and set the layer3 switch's ethernet MTU to match the broken machines; this relies on PMTUD to resolve MTU issues at the IP layer. Layer3 switches generate the ICMP errors required by PMTUD.

This option works best if you can re-address the broken machines with DHCP; and you can identify the broken machines by mac-address.

... why did they bump it up to 1504 bytes, and create invalid packets, in the first place?

Hard to say at this point

802.1ad vs 802.1q

How is IEEE 802.1ad (aka VLAN Tagging, QinQ) valid, when the packets are too large?

I haven't seen evidence so far that you're using QinQ; from the limited evidence I have seen so far, you're using simple 802.1q encapsulation, which should work correctly in Windows, assuming the NIC driver supports 802.1q encap.

End Notes:

^{Note 1}_{Any layer 3 switch should do... Cisco, Juniper, and Brocades all could perform this kind of function.}

1522-Byte Frames – Dropped by Gateway from Access Point

An ethernet frame with a payload of 1500 bytes:

Without an 802.1Q tag is 1518 bytes
With an 802.1Q tag is 1522 bytes

The WAP showing you 1514 bytes is due to the fact that the frame does not have an 802.1Q tag, and the interface or driver is not giving Wireshark the FCS. From the Wireshark Wiki:

Most Ethernet interfaces also either don't supply the FCS to Wireshark or other applications, or aren't configured by their driver to do so; therefore, Wireshark will typically only be given the green fields, although on some platforms, with some interfaces, the FCS will be supplied on incoming packets.

The WAP showing you the 1522 byte frame size is apparently showing you the FCS (four bytes), so the interface or driver is giving it to Wireshark. The other four bytes are due to an 802.1Q tag.

Your router is not expecting 802.1Q tagged frames, so it is giving you errors. You need to disable 802.1Q on the WAP. WAPs can use 802.1Q on the wired side to separate traffic among multiple SSIDs.