Linux – Debugging dropped UDP messages on Linux

ethernetlinuxlinux-networkingswitchudp

Here's my setup: I have 1 host that has a 1 Gbit Ethernet connection and 2 hosts with 100 mbit connections (connected to the 1Gbit host through different switches).

In a test, I send 1000 1kb messages from the 1Gbit host to the 100 mbit hosts (with no delay in btwn sendto() calls). For one of the 100 mbit hosts, no packets get dropped. The other though has no drops until around the 100th and then starts dropping the majority of the remaining. Very reproducible. When I introduce a 1ms delay, there are no drops on either host.

I'd like to know why there is different behavior btwn the two hosts.

What are some methods/tools I should use to track this down? I am using Linux 6.8. And my rmem_max is set to 10MB on both hosts.

Best Answer

This is expected behavior when you have unmatched speed and are running mismatched speeds. If you are able to saturate the 1GB link, the other end will have read only 100 packets by the time you have sent the 1000 packets. It is unlikely your router will buffer the remaining 900 packets.

UDP is an unreliable protocol. Unlike TCP it does not come with a built-in reliable delivery.

It may help to run a similar test with TCP connections. Running it in both directions may help to determine if the issue is unidirectional.

Running time on the processes may tell give an idea if one of the processes is running slower then the other. netstat -i before and after the running the test will allow you to calculate how much data arrived, and see if any error were generated.

ethtool may tell you if one of the hosts is in half-duplex mode. Half-duplex connections are prone to issues such as you are seeing. If there are cabling or other issues, the connection may fall back to 10 Mbit half-duplex in one or both directions.

If the switch is managed, then should check the configuration and error counters on the relevant ports.

If the two systems have different Ethernet hardware, that may be the issue. Some hardware just can't handle a saturated link.