In our instance, our problem was solved by sysctl parameters, one different from Maciej.
Please note that I do not speak for the OP (buecking), I came on this post due to the problem being related by the basic detail (no multicast traffic in userland).
We have an application that reads data sent to four multicast addresses, and a unique port per multicast address, from an appliance that is (usually) connected directly to an interface on the receiving server.
We were attempting to deploy this software on a customer site when it mysteriously failed with no known reason. Attempts at debugging this software resulted in inspecting every system call, ultimately they all told us the same thing:
Our software asks for data, and the OS never provides any.
The multicast packet counter incremented, tcpdump showed the traffic reaching the box/specific interface, yet we couldn't do anything with it. SELinux was disabled, iptables was running but had no rules in any of the tables.
Stumped, we were.
In randomly poking around, we started thinking about the kernel parameters that sysctl handles, but none of the documented features was either particularly relevant, or if they had to do with multicast traffic, they were enabled. Oh, and ifconfig did list "MULTICAST" in the feature line (up, broadcast, running, multicast). Out of curiosity we looked at /etc/sysctl.conf
. 'lo and behold, this customer's base image had a couple of extra lines added to it at the bottom.
In our case, the customer had set net.ipv4.all.rp_filter = 1
. rp_filter is the Route Path filter, which (as I understand it) rejects all traffic that could not have possibly reached this box. Network subnet hopping, the thought being that the source IP is being spoofed.
Well, this server was on a 192.168.1/24 subnet and the appliance's source IP address for the multicast traffic was somewhere in the 10.* network. Thus, the filter was preventing the server from doing anything meaningful with the traffic.
A couple of tweaks approved by the customer; net.ipv4.eth0.rp_filter = 1
and net.ipv4.eth1.rp_filter = 0
and we were running happily.
There is no such thing as an "open" UDP port, at least not in the sense most people are used to think (which is answering something like "OK, I've accepted your connection"). UDP is session-less, so "a port" (read: the UDP protocol in the operating system IP stack) will never respond "success" on its own.
UDP ports only have two states: listening or not. That usually translates to "having a socket open on it by a process" or "not having any socket open". The latter case should be easy to detect since the system should respond with an ICMP Destination Unreachable packet with code=3 (Port unreachable). Unfortunately many firewalls could drop those packets so if you don't get anything back you don't know for sure if the port is in this state or not.
And let's not forget that ICMP is session-less too and doesn't do retransmissions: the Port Unreachable packet could very well be lost somewhere on the net.
A UDP port in the "listening" state may not respond at all (the process listening on it just receives the packet and doesn't transmit anything) or it could send something back (if the process does act upon reception and if it acts by responding via UDP to the original sender IP:port). So again, you never know for sure what's the state if you don't get anything back.
You say you can have control of the receiving host: that makes you able to construct your own protocol to check UDP port reachability: just put a process on the receiving host that'll listen on the given UDP port and respond back (or send you an email, or just freak out and unlink()
everything on the host file system... anything that'll trigger your attention will do).
Best Answer
This is expected behavior when you have unmatched speed and are running mismatched speeds. If you are able to saturate the 1GB link, the other end will have read only 100 packets by the time you have sent the 1000 packets. It is unlikely your router will buffer the remaining 900 packets.
UDP is an unreliable protocol. Unlike TCP it does not come with a built-in reliable delivery.
It may help to run a similar test with TCP connections. Running it in both directions may help to determine if the issue is unidirectional.
Running
time
on the processes may tell give an idea if one of the processes is running slower then the other.netstat -i
before and after the running the test will allow you to calculate how much data arrived, and see if any error were generated.ethtool
may tell you if one of the hosts is in half-duplex mode. Half-duplex connections are prone to issues such as you are seeing. If there are cabling or other issues, the connection may fall back to 10 Mbit half-duplex in one or both directions.If the switch is managed, then should check the configuration and error counters on the relevant ports.
If the two systems have different Ethernet hardware, that may be the issue. Some hardware just can't handle a saturated link.