Multicast – Why Use Hash Instead of Simple Compare for Filtering Frames?

ethernetipv4multicast

The frame is received by the datalink on the right based on what we
call imperfect filtering, which is done by the interface using the
Ethernet destination address. We say this is imperfect because it is
normally the case that when the interface is told to receive frames
destined to one specific Ethernet multicast address, it can receive
frames destined to other Ethernet multicast addresses, too.

The book further goes on to explain why is this filtering imperfect –

…many current Ethernet interface cards apply a hash function to the
address, calculating a value between 0 and 511…

My question is- Ethernet address is 6 bytes, out of which top 3 bytes are constant for any multicast ethernet address. All that is remaining is 3 bytes. Why not compare them, byte by byte instead of all the hash logic. The filtering would be perfect( at least at the ethernet level I mean, at the IP layer we may as well end up detect this frame does not belong to our multicast group) and logic is much more simpler.

What performance benefits does hashing have when contrasted with a simple compare?

EDIT: I think there is some confusion here. Its not about 32 IP addresses mapping into a single ethernet address. Cos if such was the case, perfect filtering at the ethernet layer would have been impossible. But the book goes on to give examples of cards that are capable of perfect filtering

Another interface card does perfect filtering for 80 multicast
addresses, but then has to enter multicast promiscuous mode. Even if
the interface performs perfect filtering, perfect software filtering
at the IP layer is still required because the mapping from the IP
multicast address to the hardware address is not one-to-one

The bolded line clearly states that the 32-1 mapping problem exists at the IP layer and not the ethernet layer.

Best Answer

The main benefit of hashing rather than a simple compare is that you can do one lookup for any number of enabled multicast addresses. The system that has an exact match for 80 addresses will either have to build logic to fetch 80 addresses into a comparator in turn, or have 80 comparators in parallel. That's a lot of gates, even if it is simple in concept. And that's all extra cost and power.

In contrast, a hash lookup can be easily implemented with a shift register and a few XOR gates. The computation and shifts can be done as the bits arrive on the wire. Even better, the NIC is doing this already, in order to calculate the checksum.

Note also, that Stevens was writing 20-25 years ago; whilst it is still more or less the definitive work on TCP/IP, hardware has come along since then. The cost of adding a few thousand more gates won't make so much difference. Flicking through a few NIC datasheets, most opt for a hybrid approach: e.g. 16 exact addresses and a 4096 bit hashtable.

Regarding Path MTU Discovery

Ideally i would be relying on Path MTU discovery. But since the ethernet packets being generated are too large for any other machine to receive, there is no opportunity for IP Packet too big fragmentation messages to be returned

Based on your diagram, I agree that PMTUD cannot function between two different PCs in the same LAN segment; PCs do not generate ICMP Error messages required by PMTUD.

Jumbo frames

Some vendors (such as Cisco) have switch models which support ethernet payloads larger than 1500 bytes. Officially IEEE does not endorse this configuration, but the industry has valid needs to judiciously deviate from the original 1500 byte MTU. I have storage LAN / backup networks which leverage jumbo frame for good reason; however, I made sure that all MTUs matched inside the same vlan when I deployed jumbo frames.

Mismatched MTUs within a broadcast domain

The bottom line is that you should never have mismatched ethernet MTUs inside the same ethernet broadcast domain; if you do, it's a bug or configuration error. Regardless of bug or error, you have to solve these problems, sometimes manually.

All that discussion leads to the next question...

Why is there a spec that intentionally creates invalid ethernet frames?

I'm not sure that I agree... I don't see how the IEEE 802.3 series, or RFC 894 create invalid frames. Host implementations or host misconfigurations create invalid frames. To understand whether your implementation is following the spec, we need a lot more evidence...

This diagram is at least prima facie evidence that your MTUs are mismatched inside a broadcast domain...

+------------------+      +----------------+     +------------------+
| Realtek PCIe GBe |      | NetGear 10/100 |     | Realtek 10/100   |
|       (on-board) |      |     Switch     |     |     (on-board)   |
|                  |      +----------------+     |                  |
| Windows 7        |           ^    ^            |                  |
|                  |           |    |            |                  |
| 192.168.1.98/24  |-----------+    +------------| 192.168.1.10/24  |
| MTU = 1504 bytes |                             | MTU = 1500 bytes |
+------------------+                             +------------------+

How should an 802.3-compliant implementation respond to MTU mismatches?

What was it they [the writers of 'the spec'] expected people to do with devices that generate these too large packets?

MTU 1504 and MTU 1500 within the same broadcast domain is simply a misconfiguration; it should never be expected to work any more than mismatched IP netmasks, or mismatched IP subnets can be expected to work. Your company will have to knuckle-down and fix the root-cause of the MTU mismatches... at this time it's hard to say whether the root cause is user error, an implementation bug, or some combination of the above.

If the affected Windows machines are successfully logging into to an Active Directory Domain, one could write Windows login scripts to automatically fix MTU issues based on some well-constructed tests inside the domain login scripts (assuming the Domain Controller isn't part of the MTU issues).

If the machines are not logging into a domain, manual labor is another option.

Other possibilities to contain the damage

Use a layer3 switch^{Note 1} to build a custom vlan for anything that has broken MTUs and set the layer3 switch's ethernet MTU to match the broken machines; this relies on PMTUD to resolve MTU issues at the IP layer. Layer3 switches generate the ICMP errors required by PMTUD.

This option works best if you can re-address the broken machines with DHCP; and you can identify the broken machines by mac-address.

... why did they bump it up to 1504 bytes, and create invalid packets, in the first place?

Hard to say at this point

802.1ad vs 802.1q

How is IEEE 802.1ad (aka VLAN Tagging, QinQ) valid, when the packets are too large?

I haven't seen evidence so far that you're using QinQ; from the limited evidence I have seen so far, you're using simple 802.1q encapsulation, which should work correctly in Windows, assuming the NIC driver supports 802.1q encap.

End Notes:

^{Note 1}_{Any layer 3 switch should do... Cisco, Juniper, and Brocades all could perform this kind of function.}

Ethernet Frame Length and Babble – Understanding the Differences

1530 is including the layer-1 overhead (preamble and start frame) which you will never see without dedicated diagnostic gear, as the NIC won't present that to you. 1518 includes the FCS (CRC) which is technically part of the layer-2 information, but I've never seen a NIC pass that up the chain (read: wireshark can't show it.)

Interesting that you point to a Cisco 2900XL document. I know first hand the 2900XL crashes if you send it an "oversized" frame -- 802.1q tagged frame on a non-tagged port. (or was that the 3500XL)

If you ignore everything Cisco says, a babble is a transmitter not obeying the inter-packet gap -- sending frame after frame with little or no delay. That's a big problem for half-duplex networks.