Linux – Why are these UDP packets being dropped

iptableslinux

On trying to resolve dnsviz.net from a host using an Unbound resolver that is configured to use DNSSEC validation, the result is "no servers could be reached":

$ dig -t soa dnsviz.net
; <<>> DiG 9.6-ESV-R4 <<>> -t soa dnsviz.net
;; global options: +cmd
;; connection timed out; no servers could be reached

Nothing is logged by Unbound to suggest why this is the case.

Here is the /etc/unbound/unbound.conf:

server:
    verbosity: 1
    interface: 192.168.0.8
    interface: 127.0.0.1
    interface: ::0
    access-control: 0.0.0.0/0      refuse
    access-control: ::0/0          refuse
    access-control: 127.0.0.0/8    allow_snoop
    access-control: 192.168.0.0/16 allow_snoop
    chroot: ""
    auto-trust-anchor-file: "/etc/unbound/root.key"
    val-log-level: 2
python:
remote-control:
    control-enable: yes

If I add:

module-config: "iterator"

(thus disabling DNSSEC validation) then I am able to resolve this host normally.

The domain and its DNSSEC check out fine according to
http://dnscheck.iis.se/ so there must be something wrong with my
resolver configuration.

What is it and how do I go about debugging that?

Update:

Someone suggested that I use unbound-host in debug mode to get more info. Here we go:

$ /usr/local/sbin/unbound-host -d -4 -v -C /etc/unbound/unbound.conf -t a dnsviz.net
[1341735286] libunbound[27690:0] notice: init module 0: validator
[1341735286] libunbound[27690:0] notice: init module 1: iterator
[1341735286] libunbound[27690:0] info: resolving dnsviz.net. A IN
[1341735286] libunbound[27690:0] info: priming . IN NS
[1341735288] libunbound[27690:0] info: response for . NS IN
[1341735288] libunbound[27690:0] info: reply from <.> 192.5.5.241#53
[1341735288] libunbound[27690:0] info: query response was ANSWER
[1341735288] libunbound[27690:0] info: priming successful for . NS IN
[1341735288] libunbound[27690:0] info: response for dnsviz.net. A IN
[1341735288] libunbound[27690:0] info: reply from <.> 128.8.10.90#53
[1341735288] libunbound[27690:0] info: query response was REFERRAL
[1341735288] libunbound[27690:0] info: response for dnsviz.net. A IN
[1341735288] libunbound[27690:0] info: reply from <net.> 192.42.93.30#53
[1341735288] libunbound[27690:0] info: query response was REFERRAL
[1341735288] libunbound[27690:0] info: resolving ns8.sandia.gov. A IN
[1341735288] libunbound[27690:0] info: resolving ns9.sandia.gov. A IN
[1341735288] libunbound[27690:0] info: resolving ns2.ca.sandia.gov. A IN
[1341735288] libunbound[27690:0] info: response for ns9.sandia.gov. A IN
[1341735288] libunbound[27690:0] info: reply from <.> 199.7.83.42#53
[1341735288] libunbound[27690:0] info: query response was REFERRAL
[1341735288] libunbound[27690:0] info: response for ns8.sandia.gov. A IN
[1341735288] libunbound[27690:0] info: reply from <.> 192.58.128.30#53
[1341735288] libunbound[27690:0] info: query response was REFERRAL
[1341735288] libunbound[27690:0] info: response for ns2.ca.sandia.gov. A IN
[1341735288] libunbound[27690:0] info: reply from <.> 192.112.36.4#53
[1341735288] libunbound[27690:0] info: query response was REFERRAL
[1341735288] libunbound[27690:0] info: response for ns9.sandia.gov. A IN
[1341735288] libunbound[27690:0] info: reply from <gov.> 209.112.123.30#53
[1341735288] libunbound[27690:0] info: query response was REFERRAL
[1341735288] libunbound[27690:0] info: response for ns8.sandia.gov. A IN
[1341735288] libunbound[27690:0] info: reply from <gov.> 209.112.123.30#53
[1341735288] libunbound[27690:0] info: query response was REFERRAL
[1341735288] libunbound[27690:0] info: response for ns2.ca.sandia.gov. A IN
[1341735288] libunbound[27690:0] info: reply from <gov.> 209.112.123.30#53
[1341735288] libunbound[27690:0] info: query response was REFERRAL
[1341735300] libunbound[27690:0] info: timeouts, concluded that connection to host drops EDNS packets 198.102.153.29 port 53
[1341735300] libunbound[27690:0] info: response for ns2.ca.sandia.gov. A IN
[1341735300] libunbound[27690:0] info: reply from <sandia.gov.> 198.102.153.29#53
[1341735300] libunbound[27690:0] info: query response was ANSWER
[1341735300] libunbound[27690:0] info: resolving ns1.ca.sandia.gov. A IN
[1341735301] libunbound[27690:0] info: timeouts, concluded that connection to host drops EDNS packets 198.206.219.66 port 53
[1341735301] libunbound[27690:0] info: response for dnsviz.net. A IN
[1341735301] libunbound[27690:0] info: reply from <dnsviz.net.> 198.206.219.66#53
[1341735301] libunbound[27690:0] info: query response was DNSSEC LAME
[1341735310] libunbound[27690:0] info: timeouts, concluded that connection to host drops EDNS packets 198.206.219.65 port 53
[1341735310] libunbound[27690:0] info: response for ns9.sandia.gov. A IN
[1341735310] libunbound[27690:0] info: reply from <sandia.gov.> 198.206.219.65#53
[1341735310] libunbound[27690:0] info: query response was DNSSEC LAME
[1341735310] libunbound[27690:0] info: timeouts, concluded that connection to host drops EDNS packets 198.206.219.65 port 53
[1341735310] libunbound[27690:0] info: response for ns8.sandia.gov. A IN
[1341735310] libunbound[27690:0] info: reply from <sandia.gov.> 198.206.219.65#53
[1341735310] libunbound[27690:0] info: query response was DNSSEC LAME
[1341735310] libunbound[27690:0] info: timeouts, concluded that connection to host drops EDNS packets 198.102.153.28 port 53
[1341735310] libunbound[27690:0] info: response for ns9.sandia.gov. A IN
[1341735310] libunbound[27690:0] info: reply from <sandia.gov.> 198.102.153.28#53
[1341735310] libunbound[27690:0] info: query response was DNSSEC LAME
[1341735310] libunbound[27690:0] info: timeouts, concluded that connection to host drops EDNS packets 198.102.153.28 port 53
[1341735310] libunbound[27690:0] info: response for ns8.sandia.gov. A IN
[1341735310] libunbound[27690:0] info: reply from <sandia.gov.> 198.102.153.28#53
[1341735310] libunbound[27690:0] info: query response was DNSSEC LAME
[1341735310] libunbound[27690:0] info: timeouts, concluded that connection to host drops EDNS packets 198.102.153.29 port 53
[1341735310] libunbound[27690:0] info: response for ns9.sandia.gov. A IN
[1341735310] libunbound[27690:0] info: reply from <sandia.gov.> 198.102.153.29#53
[1341735310] libunbound[27690:0] info: query response was DNSSEC LAME
[1341735311] libunbound[27690:0] info: timeouts, concluded that connection to host drops EDNS packets 198.206.219.66 port 53
[1341735311] libunbound[27690:0] info: response for ns8.sandia.gov. A IN
[1341735311] libunbound[27690:0] info: reply from <sandia.gov.> 198.206.219.66#53
[1341735311] libunbound[27690:0] info: query response was DNSSEC LAME
[1341735315] libunbound[27690:0] info: resolving ns2.ca.sandia.gov. A IN
[1341735315] libunbound[27690:0] info: response for ns2.ca.sandia.gov. A IN
[1341735315] libunbound[27690:0] info: reply from <gov.> 69.36.157.30#53
[1341735315] libunbound[27690:0] info: query response was REFERRAL
[1341735328] libunbound[27690:0] info: timeouts, concluded that connection to host drops EDNS packets 198.102.153.28 port 53
[1341735328] libunbound[27690:0] info: response for ns1.ca.sandia.gov. A IN
[1341735328] libunbound[27690:0] info: reply from <ca.sandia.gov.> 198.102.153.28#53
[1341735328] libunbound[27690:0] info: query response was ANSWER
[1341735328] libunbound[27690:0] info: timeouts, concluded that connection to host drops EDNS packets 198.206.219.65 port 53
[1341735328] libunbound[27690:0] info: response for dnsviz.net. A IN
[1341735328] libunbound[27690:0] info: reply from <dnsviz.net.> 198.206.219.65#53
[1341735328] libunbound[27690:0] info: query response was DNSSEC LAME
[1341735332] libunbound[27690:0] info: response for ns2.ca.sandia.gov. A IN
[1341735332] libunbound[27690:0] info: reply from <sandia.gov.> 198.102.153.28#53
[1341735332] libunbound[27690:0] info: query response was ANSWER
[1341735332] libunbound[27690:0] info: resolving ns1.ca.sandia.gov. A IN
[1341735332] libunbound[27690:0] info: response for ns1.ca.sandia.gov. A IN
[1341735332] libunbound[27690:0] info: reply from <gov.> 69.36.157.30#53
[1341735332] libunbound[27690:0] info: query response was REFERRAL
[1341735332] libunbound[27690:0] info: response for ns1.ca.sandia.gov. A IN
[1341735332] libunbound[27690:0] info: reply from <sandia.gov.> 198.102.153.28#53
[1341735332] libunbound[27690:0] info: query response was ANSWER
[1341735333] libunbound[27690:0] info: response for dnsviz.net. A IN
[1341735333] libunbound[27690:0] info: reply from <dnsviz.net.> 198.102.153.28#53
[1341735333] libunbound[27690:0] info: query response was DNSSEC LAME
[1341735333] libunbound[27690:0] info: timeouts, concluded that connection to host drops EDNS packets 198.102.153.29 port 53
[1341735333] libunbound[27690:0] info: response for dnsviz.net. A IN
[1341735333] libunbound[27690:0] info: reply from <dnsviz.net.> 198.102.153.29#53
[1341735333] libunbound[27690:0] info: query response was DNSSEC LAME
[1341735333] libunbound[27690:0] info: response for dnsviz.net. A IN
[1341735333] libunbound[27690:0] info: reply from <dnsviz.net.> 198.102.153.28#53
[1341735333] libunbound[27690:0] info: query response was ANSWER
[1341735333] libunbound[27690:0] info: prime trust anchor
[1341735333] libunbound[27690:0] info: resolving . DNSKEY IN
[1341735333] libunbound[27690:0] info: response for . DNSKEY IN
[1341735333] libunbound[27690:0] info: reply from <.> 192.5.5.241#53
[1341735333] libunbound[27690:0] info: query response was ANSWER
[1341735333] libunbound[27690:0] error: Could not open autotrust file for writing, /etc/unbound/root.key: Permission denied
[1341735333] libunbound[27690:0] info: validate keys with anchor(DS): sec_status_secure
[1341735333] libunbound[27690:0] info: Successfully primed trust anchor . DNSKEY IN
[1341735333] libunbound[27690:0] info: validated DS net. DS IN
[1341735333] libunbound[27690:0] info: resolving net. DNSKEY IN
[1341735333] libunbound[27690:0] info: response for net. DNSKEY IN
[1341735333] libunbound[27690:0] info: reply from <net.> 192.48.79.30#53
[1341735333] libunbound[27690:0] info: query response was ANSWER
[1341735333] libunbound[27690:0] info: validated DNSKEY net. DNSKEY IN
[1341735333] libunbound[27690:0] info: validated DS dnsviz.net. DS IN
[1341735333] libunbound[27690:0] info: resolving dnsviz.net. DNSKEY IN
[1341735333] libunbound[27690:0] info: response for dnsviz.net. DNSKEY IN
[1341735333] libunbound[27690:0] info: reply from <dnsviz.net.> 198.102.153.29#53
[1341735333] libunbound[27690:0] info: query response was ANSWER
[1341735333] libunbound[27690:0] info: validated DNSKEY dnsviz.net. DNSKEY IN
[1341735333] libunbound[27690:0] info: Could not establish validation of INSECURE status of unsigned response.
[1341735333] libunbound[27690:0] info: resolving dnsviz.net. A IN
[1341735358] libunbound[27690:0] info: timeouts, concluded that connection to host drops EDNS packets 198.206.219.66 port 53
[1341735358] libunbound[27690:0] info: response for dnsviz.net. A IN
[1341735358] libunbound[27690:0] info: reply from <dnsviz.net.> 198.206.219.66#53
[1341735358] libunbound[27690:0] info: query response was ANSWER
[1341735358] libunbound[27690:0] info: Could not establish validation of INSECURE status of unsigned response.
[1341735358] libunbound[27690:0] info: resolving dnsviz.net. A IN
[1341735358] libunbound[27690:0] info: timeouts, concluded that connection to host drops EDNS packets 198.206.219.65 port 53
[1341735358] libunbound[27690:0] info: response for dnsviz.net. A IN
[1341735358] libunbound[27690:0] info: reply from <dnsviz.net.> 198.206.219.65#53
[1341735358] libunbound[27690:0] info: query response was ANSWER
[1341735358] libunbound[27690:0] info: Could not establish validation of INSECURE status of unsigned response.
[1341735358] libunbound[27690:0] info: resolving dnsviz.net. A IN
[1341735374] libunbound[27690:0] info: resolving dnsviz.net. A IN
[1341735375] libunbound[27690:0] info: response for dnsviz.net. A IN
[1341735375] libunbound[27690:0] info: reply from <net.> 192.54.112.30#53
[1341735375] libunbound[27690:0] info: query response was REFERRAL
[1341735375] libunbound[27690:0] info: resolving ns9.sandia.gov. A IN
[1341735375] libunbound[27690:0] info: response for ns9.sandia.gov. A IN
[1341735375] libunbound[27690:0] info: reply from <gov.> 69.36.157.30#53
[1341735375] libunbound[27690:0] info: query response was REFERRAL
[1341735375] libunbound[27690:0] info: resolving ns8.sandia.gov. A IN
[1341735375] libunbound[27690:0] info: response for ns8.sandia.gov. A IN
[1341735375] libunbound[27690:0] info: reply from <gov.> 69.36.157.30#53
[1341735375] libunbound[27690:0] info: query response was REFERRAL
Host dnsviz.net not found: 2(SERVFAIL). (insecure)

I haven't had chance to pick through this properly yet, but the
concluded that connection to host drops EDNS packets bit jumps out
at me.

Update:

This has nothing to do with Unbound – my firewall host is not forwarding some UDP packets.

eth0 is the Internet side of the firewall, eth1 is LAN side. tcpdump of both interfaces while issuing dig +norec +dnssec @198.102.153.29 sandia.gov on a machine on the LAN (the DNS server of this question):

# tcpdump -vpni eth0 'host 198.102.153.29'
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
09:37:57.234085 IP (tos 0x0, ttl  63, id 32258, offset 0, flags [none], length: 67) 82.69.129.108.37722 > 198.102.153.29.53: [udp sum ok]  24755 [1au] A? sandia.gov. (39)
09:37:57.387165 IP (tos 0x4, ttl  47, id 48355, offset 0, flags [+], length: 1196) 198.102.153.29.53 > 82.69.129.108.37722:  24755*- 2/5/13 sandia.gov. A 132.175.81.4, sandia.gov. (1168)
09:37:57.387502 IP (tos 0x4, ttl  47, id 48355, offset 1176, flags [none], length: 1498) 198.102.153.29 > 82.69.129.108: udp
09:38:02.234014 IP (tos 0x0, ttl  63, id 32259, offset 0, flags [none], length: 67) 82.69.129.108.37722 > 198.102.153.29.53: [udp sum ok]  24755 [1au] A? sandia.gov. (39)
09:38:02.386762 IP (tos 0x4, ttl  47, id 48356, offset 0, flags [+], length: 1196) 198.102.153.29.53 > 82.69.129.108.37722:  24755*- 2/5/13 sandia.gov. A 132.175.81.4, sandia.gov. (1168)
09:38:02.387101 IP (tos 0x4, ttl  47, id 48356, offset 1176, flags [none], length: 1498) 198.102.153.29 > 82.69.129.108: udp
09:38:07.260492 IP (tos 0x0, ttl  63, id 32260, offset 0, flags [none], length: 67) 82.69.129.108.37722 > 198.102.153.29.53: [udp sum ok]  24755 [1au] A? sandia.gov. (39)
09:38:07.433906 IP (tos 0x4, ttl  47, id 48357, offset 0, flags [+], length: 1196) 198.102.153.29.53 > 82.69.129.108.37722:  24755*- 2/5/13 sandia.gov. A 132.175.81.4, sandia.gov. (1168)
09:38:07.434244 IP (tos 0x4, ttl  47, id 48357, offset 1176, flags [none], length: 1498) 198.102.153.29 > 82.69.129.108: udp

9 packets captured
9 packets received by filter
0 packets dropped by kernel
# tcpdump -vpni eth1 'host 198.102.153.29'                                                                                                          
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 96 bytes
09:38:20.646202 IP (tos 0x0, ttl  64, id 32261, offset 0, flags [none], length: 67) 192.168.0.8.54056 > 198.102.153.29.53: [udp sum ok]  31422 [1au] A? sandia.gov. (39)
09:38:25.645589 IP (tos 0x0, ttl  64, id 32262, offset 0, flags [none], length: 67) 192.168.0.8.54056 > 198.102.153.29.53: [udp sum ok]  31422 [1au] A? sandia.gov. (39)
09:38:30.645640 IP (tos 0x0, ttl  64, id 32263, offset 0, flags [none], length: 67) 192.168.0.8.54056 > 198.102.153.29.53: [udp sum ok]  31422 [1au] A? sandia.gov. (39)

Note that eth0 gets a bunch of UDP packets that aren't being forwarded.

The firewall rules are quite simple, being basically "NAT everything
to/from 192.168.0.8 to 82.69.129.108, NAT everything else to
82.69.129.105, block all traffic after allowing a few sensible
ports/protocols".

Here's a rules list:

# iptables -vnL
Chain INPUT (policy DROP 87 packets, 5073 bytes)
 pkts bytes target     prot opt in     out     source               destination         
 1010  216K ACCEPT     all  --  eth1   *       0.0.0.0/0            0.0.0.0/0           
    0     0 ACCEPT     all  --  lo     *       0.0.0.0/0            0.0.0.0/0           
    0     0 ACCEPT     all  --  eth0   *       0.0.0.0/0            0.0.0.0/0           state RELATED,ESTABLISHED 
   58  4408 ACCEPT     udp  --  eth0   *       0.0.0.0/0            0.0.0.0/0           udp dpt:123 
    0     0 ACCEPT     tcp  --  eth0   *       0.0.0.0/0            0.0.0.0/0           tcp dpt:123 
    0     0 ACCEPT     icmp --  eth0   *       0.0.0.0/0            0.0.0.0/0           
    0     0 ACCEPT     tcp  --  eth0   *       0.0.0.0/0            0.0.0.0/0           tcp dpt:22 
   87  5073 LOG        all  --  *      *       0.0.0.0/0            0.0.0.0/0           limit: avg 1/sec burst 5 LOG flags 0 level 4 prefix `INPUT: ' 

Chain FORWARD (policy DROP 6 packets, 300 bytes)
 pkts bytes target     prot opt in     out     source               destination         
    2  1383 LOG        tcp  --  *      *       0.0.0.0/0            0.0.0.0/0           tcp flags:!0x16/0x02 state NEW LOG flags 0 level 4 prefix `New but not syn: ' 
    2  1383 DROP       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0           tcp flags:!0x16/0x02 state NEW 
78595   75M ACCEPT     all  --  eth1   *       0.0.0.0/0            0.0.0.0/0           
58873   13M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0           state RELATED,ESTABLISHED 
    9   576 ACCEPT     icmp --  eth0   *       0.0.0.0/0            0.0.0.0/0           
    0     0 ACCEPT     tcp  --  eth0   *       0.0.0.0/0            192.168.0.8         tcp dpt:22 
    4   240 ACCEPT     tcp  --  eth0   *       0.0.0.0/0            192.168.0.8         tcp dpt:80 
    0     0 ACCEPT     tcp  --  eth0   *       0.0.0.0/0            192.168.0.8         tcp dpt:443 
    2   120 ACCEPT     tcp  --  eth0   *       0.0.0.0/0            192.168.0.8         tcp dpt:25 
    0     0 ACCEPT     udp  --  eth0   *       192.168.2.1          192.168.0.8         udp dpt:514 
    2   152 ACCEPT     udp  --  eth0   *       192.168.2.1          192.168.0.8         udp dpt:123 
    0     0 ACCEPT     all  --  eth0   *       192.168.1.1          0.0.0.0/0           
    6   300 LOG        all  --  *      *       0.0.0.0/0            0.0.0.0/0           limit: avg 1/sec burst 5 LOG flags 0 level 4 prefix `FORWARD: ' 

Chain OUTPUT (policy ACCEPT 460 packets, 67812 bytes)
 pkts bytes target     prot opt in     out     source               destination

# iptables -t nat -vnL
Chain PREROUTING (policy ACCEPT 2696K packets, 192M bytes)
 pkts bytes target     prot opt in     out     source               destination         
   21  1236 DNAT       all  --  eth0   *       0.0.0.0/0            82.69.129.108       to:192.168.0.8 

Chain POSTROUTING (policy ACCEPT 108K packets, 10M bytes)
 pkts bytes target     prot opt in     out     source               destination         
 1549  115K SNAT       all  --  *      eth0    192.168.0.8          0.0.0.0/0           to:82.69.129.108 
  709 42396 SNAT       all  --  *      eth0    0.0.0.0/0            0.0.0.0/0           to:82.69.129.105 

Chain OUTPUT (policy ACCEPT 19719 packets, 3998K bytes)
 pkts bytes target     prot opt in     out     source               destination

Nothing useful is being logged by those LOG rules.

The firewall is a Linux install but it's running on a Soekris device read-only
from a CF card; as such I treat it more like an appliance and haven't upgraded
it since it was installed. It's therefore a really old Debian etch install with
a 2.6.12 kernel. Could this be a kernel bug related to UDP fragmentation or connection
tracking?

Anyway I'm going to remove the DNSSEC and Unbound tags from this and add iptables etc.

Best Answer

I had the exact problem and I found that the information from http://comments.gmane.org/gmane.network.dns.unbound.user/1891 solved the problem for me:

Your trace shows that unbound thinks the connection drops MTU 1500+ packets. Faa.gov uses large keys and has a lot of answers above 1480 - i.e. DNSKEY, NXDOMAIN answers. Thus your trouble likely stems from fragmentation issues. Your server cannot receive UDP DNS responses that are larger than 1480 or so.

A simple dig @..faaserver faa.gov DNSKEY +dnssec from the server shows the timeout it produces, likely.

The best solution is to fix the path that is dropping UDP fragments. Fix your firewall, upgrade it, change cisco router rules on old equipment. It must be close to your end, because I can get the fragments just fine. This is the best fix, because it allows your server to run better with large responses, and generally cleans up your network.

The workaround is edns-buffer-size: 1280 in unbound.conf.

A code fix, is in svn trunk development version of unbound. That version should fallback to smaller edns size automatically for you.

And there are useful MTU size test sites out there too.

Related Topic