Bind9 does not resolve dnssec correctly

bindcachednssecdomain-name-system

I have a problem with my dns server setup. My bind server is mainly a cache-server but does also serve some internal domains. It listens only on my private network and serves only requests from there.

Today I wanted to enable the bind to validate DNSSEC but somehow it does it not correctly. If i resolve the host name on the bind linux machine itself, the invalid DNSSEC is perfectly shown as such. But if I try to resolve the same domain using the same dig command again on my other machine in the network, the DNSSEC check does not fail and the domain gets resolved just fine. What I want it to do is to send the correct SERVFAIL to my other DNS clients in the network.

Here are all information you could need (bind version, configs, etc). I will append the digs I did at the end.

OS Version

root@thor:/etc/bind# lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 8.5 (jessie)
Release:        8.5
Codename:       jessie

root@thor:/etc/bind# uname -a
Linux thor.home.intranet 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) x86_64 GNU/Linux

bind version

BIND 9.9.5-9+deb8u6-Debian (Extended Support Version)

named.conf

include "/etc/bind/named.conf.options";
include "/etc/bind/named.conf.local";
include "/etc/bind/named.conf.default-zones";

named.conf.options

options {
        directory "/var/cache/bind";

        forwarders {
                208.67.222.222; # resolver1.opendns.com
                208.67.220.220; # resolver2.opendns.com
#               8.8.8.8; # google-public-dns-a.google.com
#               8.8.4.4; # google-public-dns-b.google.com
        };

        dnssec-enable yes;
        dnssec-validation auto;
        auth-nxdomain no;    # conform to RFC1035

        listen-on {
                127.0.0.1;
                192.168.10.36;
        };

        recursion yes;
        allow-recursion { 127.0.0.0/8; 192.168.10.0/24; };

        max-ncache-ttl 0;
};

named.conf.local

zone "intranet" {
        type master;
        file "/etc/bind/master/db.intranet";
};

zone "10.168.192.in-addr.arpa" {
        type master;
        file "/etc/bind/master/db.10.168.192";
};

zone "box" {
        type master;
        file "/etc/bind/master/db.box";
};

named.conf.default-zones

// prime the server with knowledge of the root servers
zone "." {
        type hint;
        file "/etc/bind/db.root";
};

// be authoritative for the localhost forward and reverse zones, and for
// broadcast zones as per RFC 1912

zone "localhost" {
        type master;
        file "/etc/bind/db.local";
};

zone "127.in-addr.arpa" {
        type master;
        file "/etc/bind/db.127";
};

zone "0.in-addr.arpa" {
        type master;
        file "/etc/bind/db.0";
};

zone "255.in-addr.arpa" {
        type master;
        file "/etc/bind/db.255";
};

DNS results
If I requst the invalid domain on my server (thor), I get the following:

user@thor:/etc/bind$ dig @192.168.10.36 sigfail.verteiltesysteme.net

; <<>> DiG 9.9.5-9+deb8u6-Debian <<>> @192.168.10.36 sigfail.verteiltesysteme.net
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 11750
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;sigfail.verteiltesysteme.net.  IN      A

;; Query time: 256 msec
;; SERVER: 192.168.10.36#53(192.168.10.36)
;; WHEN: Fri Jul 08 21:27:37 CEST 2016
;; MSG SIZE  rcvd: 57

If I do the exact same query on my client running Windows 10 using cygwin, I'm getting this:

user@COMPUTER:~$ dig @192.168.10.36 sigfail.verteiltesysteme.net

; <<>> DiG 9.10.3-P4 <<>> @192.168.10.36 sigfail.verteiltesysteme.net
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 52681
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 5

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;sigfail.verteiltesysteme.net.  IN      A

;; ANSWER SECTION:
sigfail.verteiltesysteme.net. 60 IN     A       134.91.78.139

;; AUTHORITY SECTION:
verteiltesysteme.net.   3600    IN      NS      ns1.verteiltesysteme.net.
verteiltesysteme.net.   3600    IN      NS      ns2.verteiltesysteme.net.

;; ADDITIONAL SECTION:
ns1.verteiltesysteme.net. 2910  IN      A       134.91.78.139
ns1.verteiltesysteme.net. 2910  IN      AAAA    2001:638:501:8efc::139
ns2.verteiltesysteme.net. 2910  IN      A       134.91.78.141
ns2.verteiltesysteme.net. 2910  IN      AAAA    2001:638:501:8efc::141

;; Query time: 52 msec
;; SERVER: 192.168.10.36#53(192.168.10.36)
;; WHEN: Fr Jul 08 21:27:46 CEST 2016
;; MSG SIZE  rcvd: 197

I hope you can help me.

Thank you in advance


— EDIT —
Thanks to @HĂ„kanLindqvist I noticed that the configuration was pretty messed up. To clean the thing a bit and to get rid of all those errors, I threw out all forwarding and resolve now on my own. This should not be that big of a deal since the server caches it anyways.
My named.conf.options looks now like the following:

options {
        directory "/var/cache/bind";

        dnssec-enable yes;
        dnssec-validation auto;

        auth-nxdomain no;    # conform to RFC1035
        listen-on {
                127.0.0.1;
                192.168.10.36;
        };

        recursion yes;
        allow-recursion { 127.0.0.0/8; 192.168.10.0/24; };

        max-ncache-ttl 0;
};

The log shows no more odd errors and invalid signatures are now corretly logged:

Jul  9 00:33:05 thor named[2940]: validating @0x7fd2d0391140: sigfail.verteiltesysteme.net A: no valid signature found
Jul  9 00:33:05 thor named[2940]: error (no valid RRSIG) resolving 'sigfail.verteiltesysteme.net/A/IN': 134.91.78.141#53

But my problem with the inconsistent results still remains. Both clients are using the same bind server:

computer:

user@COMPUTER:~$ dig +short @192.168.10.36 hostname.bind CH TXT
"thor.home.intranet"
user@COMPUTER:~$ dig +short @192.168.10.36 version.bind CH TXT
"9.9.5-9+deb8u6-Debian"

server:

user@thor:/etc/bind# dig @192.168.10.36 +short hostname.bind CH TXT
"thor.home.intranet"
user@thor:/etc/bind# dig @192.168.10.36 +short version.bind CH TXT
"9.9.5-9+deb8u6-Debian"

But the results are still different.
computer:

user@COMPUTER:~$ nslookup sigfail.verteiltesysteme.net
Server:         192.168.10.36
Address:        192.168.10.36#53

Non-authoritative answer:
Name:   sigfail.verteiltesysteme.net
Address: 134.91.78.139

server:

root@thor:/etc/bind# nslookup sigfail.verteiltesysteme.net
Server:         192.168.10.36
Address:        192.168.10.36#53

** server can't find sigfail.verteiltesysteme.net: SERVFAIL

An important thing to note (I think): Even if I send the request on my computer, my server says in the logs that there is no valid signature. That way it definitively recognizes, that the DNSSEC validation fails.. But it sends the NOERROR to my computer anyways.


— EDIT2 —
Even with the EDNS flag explicitly set, I still get a result.

user@COMPUTER:~$ dig @192.168.10.36 +dnssec sigfail.verteiltesysteme.net

; <<>> DiG 9.10.3-P4 <<>> @192.168.10.36 +dnssec sigfail.verteiltesysteme.net
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 48091
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 3, ADDITIONAL: 9

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
;; QUESTION SECTION:
;sigfail.verteiltesysteme.net.  IN      A

;; ANSWER SECTION:
sigfail.verteiltesysteme.net. 60 IN     A       134.91.78.139
sigfail.verteiltesysteme.net. 60 IN     RRSIG   A 5 3 60 20200610081125 20150611081125 30665 verteiltesysteme.net. //This+RRSIG+is+deliberately+broken///For+more+informati on+please+go+to/http+//dnssec+vs+uni/hyphen/+due+de////r eplace+/hyphen/+with+character////////////////////////// //8=

;; AUTHORITY SECTION:
verteiltesysteme.net.   3600    IN      NS      ns2.verteiltesysteme.net.
verteiltesysteme.net.   3600    IN      NS      ns1.verteiltesysteme.net.
verteiltesysteme.net.   3600    IN      RRSIG   NS 5 2 3600 20200610081125 20150611081125 30665 verteiltesysteme.net. s4iS0q402GTqtpy1WWspX1KHY3hb0/SOq79qWzRL5PFacAAKK+2ltxWW PTuwsYOWP3l+uq7xu80G0UQNtWPmISa2SYnktvXoZWbdy8F7q8GOH5xw 2t+JokxheEz5Xe4Xy7TmONIxVGq7M9FX4hDBva62PztcGq7UMZMWgyNs P/o=

;; ADDITIONAL SECTION:
ns1.verteiltesysteme.net. 69    IN      A       134.91.78.139
ns1.verteiltesysteme.net. 69    IN      AAAA    2001:638:501:8efc::139
ns2.verteiltesysteme.net. 69    IN      A       134.91.78.141
ns2.verteiltesysteme.net. 69    IN      AAAA    2001:638:501:8efc::141
ns1.verteiltesysteme.net. 69    IN      RRSIG   A 5 3 3600 20200610081125 20150611081125 30665 verteiltesysteme.net. kIcbu+YRC6xby461JYrNE3WSOQmTM6UstxKYo8uO1mEysvfDUs23Yuv6 nG+yMo3enmdIg89pPuLWIsz16uYxswl4DlplCYYPP9nT4d+9bjbMHu5S 7hi/uTlYEFwUCDlyQn38sEwnDHwbBnuW0uvYwV/TPTTjtcfYEw0R8zGI QQU=
ns1.verteiltesysteme.net. 69    IN      RRSIG   AAAA 5 3 3600 20200610081125 20150611081125 30665 verteiltesysteme.net. PzZiFVbjYHb1+xpIfZGbbtogY94uNvpqHBBibk0Sp7n5BLz4PJZ+dJYc rlikoNK1KyhnHugqCzh6Cr/t23lpioXUPjMWHFYcHsV4kcldTzt7Pl9Q 8h/IvlvtC33TYXnopmmGoV9vbjgpmgpAt//dY8UdNlXD/Dh6CDver+XT 34A=
ns2.verteiltesysteme.net. 69    IN      RRSIG   A 5 3 3600 20200610081125 20150611081125 30665 verteiltesysteme.net. PVIDSVFi0GLHavnTFj2JnHn+1A/wOAKS8fMzavMhkFycWjudxDuC19uW Ak9vCV5dR/3ZW4UGQUjZFgVI45fQP2yCJ5H98Z7vfn4FF9gxKwGy+TDt dLeOzcdorOF70aYHEWyYWK5tcq1SqXLXJQMp3G/MY362vqCzbFiIUk32 3q4=
ns2.verteiltesysteme.net. 69    IN      RRSIG   AAAA 5 3 3600 20200610081125 20150611081125 30665 verteiltesysteme.net. Fhg3JLyBsuXG4UCvG3y48gL8lz2Tu5Hx+ClxoXf4NjWs2MK/XScHEzwb UdOhz4aHnZbfWORoXHSD3DR92vBooix+522Z2GhCg1eiXBP66VDyypqT Ar7kUTXJHmsa70k/ubYHC6P6Imy68CbIi5xPr+OFZHrL/CTv9fcLVg3A ikU=

;; Query time: 53 msec
;; SERVER: 192.168.10.36#53(192.168.10.36)
;; WHEN: Sa Jul 09 01:07:08 CEST 2016
;; MSG SIZE  rcvd: 1277

— EDIT3 —
I enabled the query log on debug level 10 to ensure that the correct query are being sent. The following three entries are being generated by the query "dig @192.168.10.36 +dnssec sigfail.verteiltesysteme.net"

09-Jul-2016 01:23:50.419 client 192.168.10.36#47038 (sigfail.verteiltesysteme.net): query: sigfail.verteiltesysteme.net IN A +ED (192.168.10.36)
09-Jul-2016 01:23:59.620 client 192.168.10.2#64858 (sigfail.verteiltesysteme.net): query: sigfail.verteiltesysteme.net IN A +ED (192.168.10.36)
09-Jul-2016 01:24:32.417 client 192.168.10.2#54071 (sigfail.verteiltesysteme.net): query: sigfail.verteiltesysteme.net IN A +ED (192.168.10.36)

192.168.10.2 is my computer, 192.168.10.36 is the server on which bind runs.
I additionally downloaded the current bind version from isc.org as you suggested and run it. The result is the same as with cygwin. The third result in the log above is generated by isc.org bind.


— EDIT 4 —

As a very late but last edit: I finally found the solution.
I was using Avast as my AV which seemingly intercepted DNS traffic and forwarded it to their Avast "secure server".
Uninstalling Avast and just running Windows Defender solved the problem.

Best Answer

The forwarders that you have configured will only cause problems when running a validating resolver as the Opendns servers do not cooperate when doing DNSSEC validation.

I suppose it might mostly work anyway for you as you didn't specify forward only, so named will fall back to resolving things on its own more or less all the time as the forwarders keep failing to produce useful results. But even if it sort of works it will still make a complete mess of your logs.

To demonstrate, if I set forward only and use those same forwarders this is what happens:

named[20057]: error (no valid RRSIG) resolving 'net/DS/IN': 208.67.220.220#53
named[20057]: error (no valid RRSIG) resolving 'net/DS/IN': 208.67.222.222#53
named[20057]: error (no valid DS) resolving 'sigfail.verteiltesysteme.net/A/IN': 208.67.222.222#53
named[20057]: validating @0x7f36805ecb10: sigfail.verteiltesysteme.net A: bad cache hit (net/DS)
named[20057]: error (broken trust chain) resolving 'sigfail.verteiltesysteme.net/A/IN': 208.67.220.2

As you can see, it fails but for entirely the wrong reason. (It failed at the DS for net, not when validating the actually broken signatures at sigfail.verteiltesysteme.net.)

I expect your logs are currently a mix of stuff like the above combined with actually relevant entries from when named falls back to querying properly working servers. Fixing this ought to help troubleshooting.

As for the inconsistent results, I'm not sure that anything in your configuration can really explain that. Are you positive that it's actually that same named instance that answered the query? No strange NAT rules or something like that which would cause clients to transparently talk to some different server or whatnot?

Queries like dig @192.168.10.36 version.bind CH TXT and dig @192.168.10.36 hostname.bind CH TXT could expose such a thing going on.

Related Topic