Firewall – Docker Swarm. Containers in one overlay network but on different nodes can’t reach each other via tcp

dockerdocker-networkingdocker-swarmfirewall

I have a docker swarm cluster with 12 nodes. Containers deployed on single node can reach each other fine via overlay network, but when they are deployed on different nodes, there are connectivity issue: hostnames are resolved and I can ping one container from another, but when I try reach other container via tcp (for example with telnet) I'm getting long wait and then connection timeout.
Firewall on each node are already set up for docker swarm, with ports 2377, 7946 and 4789 open.

Example:
On my master node I ran this commands to create services scheduled for different nodes:

docker network create -d overlay test_net
docker service create --constraint node.labels.first==true --name first --network test_net ubuntu/nginx:1.18-20.04_beta
docker service create --constraint node.labels.second==true --name second --network test_net ubuntu/nginx:1.18-20.04_beta

Then from container first I'm running:

root@37be801ebe8b:/# ping second
PING second (10.0.5.18): 56 data bytes
64 bytes from 10.0.5.18: icmp_seq=0 ttl=64 time=0.092 ms
64 bytes from 10.0.5.18: icmp_seq=1 ttl=64 time=0.067 ms
64 bytes from 10.0.5.18: icmp_seq=2 ttl=64 time=0.083 ms
64 bytes from 10.0.5.18: icmp_seq=3 ttl=64 time=0.073 ms
^C--- second ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.067/0.079/0.092/0.000 ms

But then, when I'm trying to connect other node with telnet (there are nginx in this container listening on port 80):

root@37be801ebe8b:/# telnet second 80
Trying 10.0.5.18...
telnet: Unable to connect to remote host: Connection timed out

Can someone suggest workaround for this problem?

Best Answer

Found answer here https://stackoverflow.com/questions/66251422/docker-swarm-overlay-network-icmp-works-but-not-anything-else

The problem was with the bad checksums on the outbound packets. Which were dropping by network interface because of that.

The solution was to disable checksum offloading. Using ethtool:

# ethtool -K <interface> tx off
Related Topic