Linux – ping: sendmsg: Operation not permitted (sometimes)

linuxnetworkingpingUbuntu

On a Ubuntu 14.04 running Haproxy, after a service haproxy reload, Haproxy is suddenly reporting all servers behind it as down.

After some digging around, I noticed that ping is not working properly, sometimes it's able to ping successfully, then seconds later we get the error ping: sendmsg: Operation not permitted.

It's also not able to resolve subdomain.domain.com.

iptables -L does not show any rules in place. iptables --flush does not help.

Any ideas?

root@some-test:~# ping 107.1.1.1

PING 107.1.1.1 (107.1.1.1) 56(84) bytes of data.
64 bytes from 107.1.1.1: icmp_seq=1 ttl=63 time=0.425 ms
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
64 bytes from 107.1.1.1: icmp_seq=6 ttl=63 time=0.390 ms
64 bytes from 107.1.1.1: icmp_seq=7 ttl=63 time=0.533 ms
64 bytes from 107.1.1.1: icmp_seq=8 ttl=63 time=0.357 ms
64 bytes from 107.1.1.1: icmp_seq=9 ttl=63 time=0.343 ms
64 bytes from 107.1.1.1: icmp_seq=10 ttl=63 time=0.380 ms
64 bytes from 107.1.1.1: icmp_seq=11 ttl=63 time=0.398 ms
64 bytes from 107.1.1.1: icmp_seq=12 ttl=63 time=0.423 ms
64 bytes from 107.1.1.1: icmp_seq=13 ttl=63 time=0.293 ms
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
64 bytes from 107.1.1.1: icmp_seq=16 ttl=63 time=0.371 ms
64 bytes from 107.1.1.1: icmp_seq=17 ttl=63 time=0.374 ms
64 bytes from 107.1.1.1: icmp_seq=18 ttl=63 time=0.305 ms
64 bytes from 107.1.1.1: icmp_seq=19 ttl=63 time=0.259 ms
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
64 bytes from 107.1.1.1: icmp_seq=24 ttl=63 time=0.370 ms
64 bytes from 107.1.1.1: icmp_seq=25 ttl=63 time=0.316 ms
64 bytes from 107.1.1.1: icmp_seq=26 ttl=63 time=0.412 ms
64 bytes from 107.1.1.1: icmp_seq=27 ttl=63 time=0.512 ms
64 bytes from 107.1.1.1: icmp_seq=28 ttl=63 time=0.375 ms
64 bytes from 107.1.1.1: icmp_seq=29 ttl=63 time=0.352 ms
64 bytes from 107.1.1.1: icmp_seq=30 ttl=63 time=0.331 ms
64 bytes from 107.1.1.1: icmp_seq=31 ttl=63 time=0.290 ms
64 bytes from 107.1.1.1: icmp_seq=32 ttl=63 time=0.353 ms
64 bytes from 107.1.1.1: icmp_seq=33 ttl=63 time=0.378 ms
64 bytes from 107.1.1.1: icmp_seq=34 ttl=63 time=0.523 ms
64 bytes from 107.1.1.1: icmp_seq=35 ttl=63 time=0.351 ms
64 bytes from 107.1.1.1: icmp_seq=36 ttl=63 time=0.302 ms
64 bytes from 107.1.1.1: icmp_seq=37 ttl=63 time=0.496 ms
64 bytes from 107.1.1.1: icmp_seq=38 ttl=63 time=0.377 ms
64 bytes from 107.1.1.1: icmp_seq=39 ttl=63 time=0.357 ms
64 bytes from 107.1.1.1: icmp_seq=40 ttl=63 time=0.396 ms
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
64 bytes from 107.1.1.1: icmp_seq=52 ttl=63 time=0.372 ms
64 bytes from 107.1.1.1: icmp_seq=53 ttl=63 time=0.412 ms
64 bytes from 107.1.1.1: icmp_seq=54 ttl=63 time=0.321 ms
64 bytes from 107.1.1.1: icmp_seq=55 ttl=63 time=0.366 ms
64 bytes from 107.1.1.1: icmp_seq=56 ttl=63 time=0.379 ms
64 bytes from 107.1.1.1: icmp_seq=57 ttl=63 time=0.395 ms
64 bytes from 107.1.1.1: icmp_seq=58 ttl=63 time=0.488 ms
64 bytes from 107.1.1.1: icmp_seq=59 ttl=63 time=0.513 ms
64 bytes from 107.1.1.1: icmp_seq=60 ttl=63 time=0.435 ms
^C
--- 107.1.1.1 ping statistics ---
60 packets transmitted, 39 received, 35% packet loss, time 59008ms
rtt min/avg/max/mdev = 0.259/0.385/0.533/0.067 ms

Best Answer

I think problem is because of exceeded count of connections in conntrack - then new connections can't be established until old are expired.. Probably you can see in dmesg something like:

[1824447.285257] nf_conntrack: table full, dropping packet.
[1824447.522502] nf_conntrack: table full, dropping packet.

Current max of conntrack you can see in:

undefine@uml:~$ sudo sysctl net.nf_conntrack_max
net.nf_conntrack_max = 65536

and current conntrack count in:

undefine@uml:~$ sysctl net.netfilter.nf_conntrack_count
net.netfilter.nf_conntrack_count = 157

Currenct connections you can display using conntrack -L (tool from conntrack package). It's usefull to look there and check what type are them - it's possible that some aren't necessary.

You have three possibilites:

  1. dont use conntrack (simply - don't use nat table and unload nf_conntrack module
  2. disable conntrack for outgoint connections (in raw table use -j NOTRACK for problematic connections
  3. increase connection count by:

    undefine@uml:~$ sudo sysctl net.nf_conntrack_max=512000 net.nf_conntrack_max = 512000 or put net.nf_conntrack_max=512000 into /etc/sysctl.conf and then invoke sysctl -w to reload it.