Responding to individual concerns in the post...
Regarding Path MTU Discovery
Ideally i would be relying on Path MTU discovery. But since the ethernet packets being generated are too large for any other machine to receive, there is no opportunity for IP Packet too big fragmentation messages to be returned
Based on your diagram, I agree that PMTUD cannot function between two different PCs in the same LAN segment; PCs do not generate ICMP Error messages required by PMTUD.
Jumbo frames
Some vendors (such as Cisco) have switch models which support ethernet payloads larger than 1500 bytes. Officially IEEE does not endorse this configuration, but the industry has valid needs to judiciously deviate from the original 1500 byte MTU. I have storage LAN / backup networks which leverage jumbo frame for good reason; however, I made sure that all MTUs matched inside the same vlan when I deployed jumbo frames.
Mismatched MTUs within a broadcast domain
The bottom line is that you should never have mismatched ethernet MTUs inside the same ethernet broadcast domain; if you do, it's a bug or configuration error. Regardless of bug or error, you have to solve these problems, sometimes manually.
All that discussion leads to the next question...
Why is there a spec that intentionally creates invalid ethernet frames?
I'm not sure that I agree... I don't see how the IEEE 802.3 series, or RFC 894 create invalid frames. Host implementations or host misconfigurations create invalid frames. To understand whether your implementation is following the spec, we need a lot more evidence...
This diagram is at least prima facie evidence that your MTUs are mismatched inside a broadcast domain...
+------------------+ +----------------+ +------------------+
| Realtek PCIe GBe | | NetGear 10/100 | | Realtek 10/100 |
| (on-board) | | Switch | | (on-board) |
| | +----------------+ | |
| Windows 7 | ^ ^ | |
| | | | | |
| 192.168.1.98/24 |-----------+ +------------| 192.168.1.10/24 |
| MTU = 1504 bytes | | MTU = 1500 bytes |
+------------------+ +------------------+
How should an 802.3-compliant implementation respond to MTU mismatches?
What was it they [the writers of 'the spec'] expected people to do with devices that generate these too large packets?
MTU 1504 and MTU 1500 within the same broadcast domain is simply a misconfiguration; it should never be expected to work any more than mismatched IP netmasks, or mismatched IP subnets can be expected to work. Your company will have to knuckle-down and fix the root-cause of the MTU mismatches... at this time it's hard to say whether the root cause is user error, an implementation bug, or some combination of the above.
If the affected Windows machines are successfully logging into to an Active Directory Domain, one could write Windows login scripts to automatically fix MTU issues based on some well-constructed tests inside the domain login scripts (assuming the Domain Controller isn't part of the MTU issues).
If the machines are not logging into a domain, manual labor is another option.
Other possibilities to contain the damage
Use a layer3 switchNote 1 to build a custom vlan for anything that has broken MTUs and set the layer3 switch's ethernet MTU to match the broken machines; this relies on PMTUD to resolve MTU issues at the IP layer. Layer3 switches generate the ICMP errors required by PMTUD.
This option works best if you can re-address the broken machines with DHCP; and you can identify the broken machines by mac-address.
... why did they bump it up to 1504 bytes, and create invalid packets, in the first place?
Hard to say at this point
802.1ad vs 802.1q
How is IEEE 802.1ad (aka VLAN Tagging, QinQ) valid, when the packets are too large?
I haven't seen evidence so far that you're using QinQ; from the limited evidence I have seen so far, you're using simple 802.1q encapsulation, which should work correctly in Windows, assuming the NIC driver supports 802.1q encap.
End Notes:
Note 1Any layer 3 switch should do... Cisco, Juniper, and Brocades all could perform this kind of function.
I'm not overly familiar with the IP phones other than I know they have the ability to pass traffic to a workstation beyond it, so while I'm pretty sure this answer will work, I offer no guarantees.
How to secure against Double VLAN tagging and CDP attacks on that
port.
Your easiest way to protect against Double VLAN tagging, is to properly configure your switch.
- Don't use VLAN1 for any of your ports.
- Change the native VLAN on all your trunk ports to an unused VLAN ID. (I personally use VLAN999)
For CDP attacks, the easiest way (to me) is to disable CDP on the interface.
switch(config-if)# no cdp enable
As Mike mentioned in the comments to this answer, Cisco IP phones require CDP to operate. After looking a little into it, it looks like the general consensus to this fact is while leaving CDP running technically leaves you open to attack, the threat is heavily mitigated through best port security practices. So make sure your ports are protected from rogue devices and you theoretically should never have an issue.
Can you set a minimum amount of Active MAC addresses and then limit
the Aging period on MAC addresses on a specific switchport...
For each switch port going to your phones (and workstation)
switch(config-if)# switchport port-security
switch(config-if)# switchport port-security maximum #
switch(config-if)# switchport port-security mac-address sticky
If you know how many devices you are going to have you can set the maximum to that; If you know you the actual mac-address of the devices you can set those manually as well.
switch(config-if)# switchport port-security mac-address sticky AAAA.BBBB.CCCC
...such that if someone disconnects the phone and sets up a Cisco switch
or another Rogue device, then the port should become Shutdown within
the aging period.
To protect against a rogue switch, I use the following:
switch(config-if)# spanning-tree port fast bpduguard
This puts the port into err-disabled as soon as it detects a BPDU, which should only be coming from a switch.
Of course best practice is to have strong physical control over your equipment, but I'm sure we all know that is nigh impossible sometimes. As I said, I'm not familiar with IP phone setups, so this whole answer may be wrong. If I find something refuting this answer or other wise, I'll update as necessary.
Good luck!
Best Answer
A VoIP phone that chains to another device is a switch, and it negotiates a trunk between the phone and the switch. This happens with CDP or LLDP.
For example, a Cisco switch interface configured as an access interface connecting to a Cisco phone will use CDP to negotiate a trunk from the access interface:
If you connect a PC to the switch interface, you will have an access interface using VLAN 10, but if you connect a Cisco phone, you will get a trunk with both the access VLAN 10 and the VoIP VLAN 20.