Networking – Troubleshooting Packet Drops Between Network Interface and iptables


I have a server configured with multiple interfaces and multiple VLANs. It works perfectly fine for all the local networks, it drops packets forwarded through my router for some reason though. And it's not even consistent. Sometimes I can get it working for a couple days before it starts dropping packets again. I would love to keep digging but the only things but the only results I can get out of Google are people who need help setting up iptables.


$ cat /etc/issue
Ubuntu 18.04.4 LTS \n \l
$ cat /etc/netplan/50-cloud-init.yaml
    version: 2
            dhcp4: true
            dhcp6: true
            dhcp4: false
            dhcp6: false
            id: 18
            link: enp6s0
            dhcp4: true
            optional: true
            id: 150
            link: enp6s0
            dhcp4: true
            optional: true
            id: 155
            link: enp6s0
            dhcp4: true
            optional: true

The interface in question is enp10s0. I had it on enp6s0 in the VLANs for a while but moved it to a seperate NIC to isolate variables. That didn't change anything.

$ netstat -s enp10s0
    Forwarding: 2
    4207683 total packets received
    11 with invalid addresses
    0 forwarded
    0 incoming packets discarded
    4197424 incoming packets delivered
    2183348 requests sent out
    21 outgoing packets dropped
    1634 active connection openings
    1615 passive connection openings
    150 failed connection attempts
    1100 connection resets received
    43 connections established
    4207863 segments received
    2190261 segments sent out
    596 segments retransmitted
    0 bad segments received
    222 resets sent



I add the following first line to my iptables INPUT chain:

-p tcp -m tcp --dport 22 -j LOG --log-prefix "IPTABLES SEEN: "

I watch traffic using tcpdump:

tcpdump -n -e -vv -i enp10s0 port 22

Step 1: Prove it works locally

From my router telnet to the server in question port 22.

iptables log:

Jul 15 23:58:04 meji kernel: IPTABLES SEEN: IN=enp10s0 OUT= MAC=60:a4:4c:60:ce:ce:e0:63:da:21:c1:a5:08:00 SRC= DST= LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=44677 DF PROTO=TCP SPT=48770 DPT=22 WINDOW=14600 RES=0x00 SYN URGP=0

tcpdump log:

23:58:04.335447 e0:63:da:21:c1:a5 > 60:a4:4c:60:ce:ce, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 44677, offset 0, flags [DF], proto TCP (6), length 60) > Flags [S], cksum 0xbb2d (correct), seq 978415077, win 14600, options [mss 1460,sackOK,TS val 25150304 ecr 0,nop,wscale 7], length 0

SYN ACK follows like you would expect and everything is fine.

Step 2: compare to the router forwarded connection

I use nc -vz from my remote server ( to connect to the same ip/port.

tcpdump log:

00:54:44.427670 e0:63:da:21:c1:a5 > 60:a4:4c:60:ce:ce, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 56, id 18829, offset 0, flags [DF], proto TCP (6), length 60) > Flags [S], cksum 0x8c20 (correct), seq 1566819019, win 65320, options [mss 1420,sackOK,TS val 1249821436 ecr 0,nop,wscale 6], length 0

iptables log:

nothing. nothing is logged.

No SYN ACK and a retransmit attempt comes through a moment later. The NIC is not reporting errors, iptables sees nothing, and I am left scratching my head. Where can I even look from here? Start digging in the kernel? The network drivers?

Additional Requested Information

$ ip route show
default via dev vlan18 proto dhcp src metric 100 
default via dev vlan150 proto dhcp src metric 100 dev vlan18 proto kernel scope link src dev vlan18 proto dhcp scope link src metric 100 dev enp10s0 proto kernel scope link src dev vlan150 proto kernel scope link src dev vlan150 proto dhcp scope link src metric 100 dev vlan155 proto kernel scope link src 

More iptables stuff. But when I clear out all rules and change all policies to ACCEPT I still have the same problem. I'm confident I've eliminated iptables rules as the culprit.

# iptables -vL
Chain INPUT (policy DROP 442K packets, 81M bytes)
 pkts bytes target     prot opt in     out     source               destination         
 348M  494G ACCEPT     all  --  any    any     anywhere             anywhere             state RELATED,ESTABLISHED
1692K  301M ACCEPT     all  --  lo     any     anywhere             anywhere             /* Loopback Interface */
 327K   24M ACCEPT     all  --  vlan18 any     anywhere             anywhere
 174K   14M ACCEPT     all  --  vlan155 any     anywhere             anywhere
    0     0 ACCEPT     tcp  --  enp10s0 any     anywhere             anywhere             tcp dpt:ssh state NEW,ESTABLISHED /* Ssh Passthrough */

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 190M packets, 17G bytes)
 pkts bytes target     prot opt in     out     source               destination  
# iptables --list --table raw
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
# iptables --list --table mangle
target     prot opt source               destination         

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

target     prot opt source               destination         
# iptables --list --table nat
target     prot opt source               destination         

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

target     prot opt source    

Best Answer

There are multiple reasons for the system to drop packet silently, but this case is really simple: what do you see as a result of ip r g Since you don't have any route via enp10s0 except for the, you could disable rp_filter ...or add some/the right default route:

ip r a default via dev enp10s0 metric 50

After all, this is router, not the local networks on vlan18 or vlan150, isn't it? Why do you want to reach outside world via vlan18 or vlan150?

Since the enp10s0 is also DHCP-configured, the problem is that your router doesn't set up the default route on your server. This explains the lack of consistency - if the route appears, you got the connectivity, if it disappears, you don't.

For the record: it is a really bad idea to use DHCP on servers like this. Even with default route configured static with lower metric, as provided above, one rogue DHCP server can easily insert more specific network; consider (or actually anything up to /25) provided on vlan18/150 - such address would surely cut you out of the proper (?) default route via enp10s0. If the router is yours and you want to use it as the default route and still keep the DHCP configuration on other (local) interfaces consider using /31 (or at least "classic" /30) connection address.

Anyway, is your server supposed to be reachable from the outside world via vlan18 or vlan150 by some other forwarding rules on their respective routers? If so, you will have to cope with policy based routing, as the response packets must be sent via the interfaces that the inbound ones were delivered. After all, you don't have public IP here and so it won't work with asymmetric routing (which in general is not forbidden).

Actually, since you can't do asymmetric routing (without public IP) and don't have policy based routing, you shouldn't disable rp_filter - this will only hide the real problems underneath. It won't work anyway - packets forwarded by your router (probably DNAT-ed) would be delivered to your server, but the responses would travel to either vlan18 or vlan150, not the router which has the appropriate conntrack entry and owns the address the connection was initiated at.

As for the sake of this question - you might hide all the iptables-related stuff. Everytime you see the "silent" drop, i.e. not visible in iptables counters, don't even bother to further investigate the problem in netfilter layer. This might be the NIC drop (buffer issues, CPU starvation) - easily verified with tcpdump, TTL expiration (while forwarding, for packets with non-local destination) or some internal filtering like here (reverse path one).