What can cause network packets consisting of PUU

We have a system which is suffering from comms outages on a gigabit ethernet network. The traffic load on the network is such as to slightly stress a 100Mb network, but there are gigabit switches and NICs and cables throughout – or so I am told by the customer who built the network we are plugging into.

We plugged in a laptop running Wireshark via a 100baseT hub and found that it reported lots of "Ethernet II" packets where the raw data, when displayed as ASCII, basically looks like this:

PUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU

Naturally I immediately named this issue "Network PUU" and many giggles ensued. We're all in our forties or so, but I guess some of us never grow up (guilty!)

Anyway, more seriously, other perfectly valid packets were being corrupted by this data. IPv4 headers were getting bytes replaced with U bytes as well as there being data corruption which would cause the software to reject the data, even if the IP checksums didn't fail to match. We are pretty sure that this data spewing onto the network is causing the comms outages. What we don't know is where it might be coming from.

Has anyone ever seen this happen before? Did you solve it? Did you figure out where it came from?

====EDITED====

Added mention of the hub to the original description since, judging from the comments below, it is the most likely source of the corruption! The tool we used to try and find the network issue appears to have added a new and worse network issue.

Best Answer

Anyway, more seriously, other perfectly valid packets were being corrupted by this data. IPv4 headers were getting bytes replaced with U bytes as well as there being data corruption which would cause the software to reject the data, even if the IP checksums didn't fail to match.

It's surprising that just alternating bits (U is ASCII 0x55 or 01010101b) actually make up valid Ethernet frames or even valid IP packets. If this corruption crawls into mainly intact frames/packets as well, it can only be caused by - most likely - a faulty switch (bad buffer memory) or a faulty host (NIC or RAM).

If frame data is corrupted in transport, on the cable, the FCS extremely likely fails to verify, making the very next switch drop that frame. However, if such a frame is transported through the network with a valid FCS, it must have been corrupted before that FCS was calculated, which mandates a defective switch or host.

You'll need to trace back that traffic. If the source MAC address isn't valid or can't be checked on intermediate (unmanaged) switches you'll need to trace your way back along the cables.

Best Answer

Related Solutions

Ubuntu – Linux bonded Interfaces hanging periodically

CAT5e RJ45 splitter with CAT6 network: compatibility & performance

Related Topic