Server 2012R2 DNS server returning SERVFAIL for some AAAA queries

active-directorydomain-name-systemmx-record

(Rewriting most of this question since a lot of my original tests are irrelevant in light of new information)

I'm having issues with Server 2012R2 DNS servers. The biggest side effect of these issues is Exchange emails not going through. Exchange queries for AAAA records before trying A records. When it sees SERVFAIL for the AAAA record, it doesn't even try A records, it just gives up.

For some domains, when querying against my active directory DNS servers, I get SERVFAIL instead of NOERROR with no results.

I have tried this from several different Server 2012R2 domain controllers that are running DNS. One of them is an entirely separate domain, on a different network behind a different firewall and internet connection.

Two addresses that I know cause this problem are smtpgw1.gov.on.ca and mxmta.owm.bell.net

I've been using dig on a linux machine to test this (192.168.5.5 is my domain controller):

grant@linuxbox:~$ dig @192.168.5.5 smtpgw1.gov.on.ca -t AAAA

; <<>> DiG 9.9.5-3ubuntu0.5-Ubuntu <<>> @192.168.5.5 smtpgw1.gov.on.ca -t AAAA
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 56328
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4000
;; QUESTION SECTION:
;smtpgw1.gov.on.ca.             IN      AAAA

;; Query time: 90 msec
;; SERVER: 192.168.5.5#53(192.168.5.5)
;; WHEN: Wed Oct 21 14:09:10 EDT 2015
;; MSG SIZE  rcvd: 46

But queries against a public domain controller work as expected:

grant@home-ssh:~$ dig @4.2.2.1 smtpgw1.gov.on.ca -t AAAA

; <<>> DiG 9.9.5-3ubuntu0.5-Ubuntu <<>> @4.2.2.1 smtpgw1.gov.on.ca -t AAAA
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 269
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 8192
;; QUESTION SECTION:
;smtpgw1.gov.on.ca.             IN      AAAA

;; Query time: 136 msec
;; SERVER: 4.2.2.1#53(4.2.2.1)
;; WHEN: Wed Oct 21 14:11:19 EDT 2015
;; MSG SIZE  rcvd: 46

As I said, I've tried this on two different networks and domains. One is a brand new domain, which definitely has all default settings for DNS. The other has been migrated to Server 2012, so some old settings from 2003/2008 may have carried over. I get the same results on both of them.

Disabling EDNS with dmscnd /config /enableednsprobes 0 fixes it. I see many search results about EDNS being a problem in Server 2003, but not much that matches what I'm seeing in Server 2012. Neither firewall has a problem with EDNS. Disabling EDNS should just be a temporary workaround though – it prevents the use of DNSSEC, and might cause other issues.

I have also seen some posts about issues with Server 2008R2 and EDNS, but those same posts say things are fixed in Server 2012, so it should work properly.

I have also tried enabling the debug log for DNS. I can see the packets that I expected, but it doesn't give me much insight as to why it's returning SERVFAIL. Here is the relevant portions of the DNS server debug log:

First packet – query from client to my DNS server

10/16/2015 9:42:29 AM 0974 PACKET  000000EFF1BF01A0 UDP Rcv 172.16.0.254    a61e   Q [2001   D   NOERROR] AAAA   (7)smtpgw1(3)gov(2)on(2)ca(0)
UDP question info at 000000EFF1BF01A0
  Socket = 508
  Remote addr 172.16.0.254, port 50764
  Time Query=4556080, Queued=0, Expire=0
  Buf length = 0x0fa0 (4000)
  Msg length = 0x002e (46)
  Message:
    XID       0xa61e
    Flags     0x0120
      QR        0 (QUESTION)
      OPCODE    0 (QUERY)
      AA        0
      TC        0
      RD        1
      RA        0
      Z         0
      CD        0
      AD        1
      RCODE     0 (NOERROR)
    QCOUNT    1
    ACOUNT    0
    NSCOUNT   0
    ARCOUNT   1
    QUESTION SECTION:
    Offset = 0x000c, RR count = 0
    Name      "(7)smtpgw1(3)gov(2)on(2)ca(0)"
      QTYPE   AAAA (28)
      QCLASS  1
    ANSWER SECTION:
      empty
    AUTHORITY SECTION:
      empty
    ADDITIONAL SECTION:
    Offset = 0x0023, RR count = 0
    Name      "(0)"
      TYPE   OPT  (41)
      CLASS  4096
      TTL    0
      DLEN   0
      DATA   
        Buffer Size  = 4096
        Rcode Ext    = 0
        Rcode Full   = 0
        Version      = 0
        Flags        = 0

Second packet – query from my DNS server to their DNS server

10/16/2015 9:42:29 AM 0974 PACKET  000000EFF0A22160 UDP Snd 204.41.8.237    3e6c   Q [0000       NOERROR] AAAA   (7)smtpgw1(3)gov(2)on(2)ca(0)
UDP question info at 000000EFF0A22160
  Socket = 9812
  Remote addr 204.41.8.237, port 53
  Time Query=0, Queued=0, Expire=0
  Buf length = 0x0fa0 (4000)
  Msg length = 0x0023 (35)
  Message:
    XID       0x3e6c
    Flags     0x0000
      QR        0 (QUESTION)
      OPCODE    0 (QUERY)
      AA        0
      TC        0
      RD        0
      RA        0
      Z         0
      CD        0
      AD        0
      RCODE     0 (NOERROR)
    QCOUNT    1
    ACOUNT    0
    NSCOUNT   0
    ARCOUNT   0
    QUESTION SECTION:
    Offset = 0x000c, RR count = 0
    Name      "(7)smtpgw1(3)gov(2)on(2)ca(0)"
      QTYPE   AAAA (28)
      QCLASS  1
    ANSWER SECTION:
      empty
    AUTHORITY SECTION:
      empty
    ADDITIONAL SECTION:
      empty

Third packet – response from their DNS server (NOERROR)

10/16/2015 9:42:29 AM 0974 PACKET  000000EFF2188100 UDP Rcv 204.41.8.237    3e6c R Q [0084 A     NOERROR] AAAA   (7)smtpgw1(3)gov(2)on(2)ca(0)
UDP response info at 000000EFF2188100
  Socket = 9812
  Remote addr 204.41.8.237, port 53
  Time Query=4556080, Queued=0, Expire=0
  Buf length = 0x0fa0 (4000)
  Msg length = 0x0023 (35)
  Message:
    XID       0x3e6c
    Flags     0x8400
      QR        1 (RESPONSE)
      OPCODE    0 (QUERY)
      AA        1
      TC        0
      RD        0
      RA        0
      Z         0
      CD        0
      AD        0
      RCODE     0 (NOERROR)
    QCOUNT    1
    ACOUNT    0
    NSCOUNT   0
    ARCOUNT   0
    QUESTION SECTION:
    Offset = 0x000c, RR count = 0
    Name      "(7)smtpgw1(3)gov(2)on(2)ca(0)"
      QTYPE   AAAA (28)
      QCLASS  1
    ANSWER SECTION:
      empty
    AUTHORITY SECTION:
      empty
    ADDITIONAL SECTION:
      empty

Fourth packet – response from my DNS server to client (SERVFAIL)

10/16/2015 9:42:29 AM 0974 PACKET  000000EFF1BF01A0 UDP Snd 172.16.0.254    a61e R Q [8281   DR SERVFAIL] AAAA   (7)smtpgw1(3)gov(2)on(2)ca(0)
UDP response info at 000000EFF1BF01A0
  Socket = 508
  Remote addr 172.16.0.254, port 50764
  Time Query=4556080, Queued=4556080, Expire=4556083
  Buf length = 0x0fa0 (4000)
  Msg length = 0x002e (46)
  Message:
    XID       0xa61e
    Flags     0x8182
      QR        1 (RESPONSE)
      OPCODE    0 (QUERY)
      AA        0
      TC        0
      RD        1
      RA        1
      Z         0
      CD        0
      AD        0
      RCODE     2 (SERVFAIL)
    QCOUNT    1
    ACOUNT    0
    NSCOUNT   0
    ARCOUNT   1
    QUESTION SECTION:
    Offset = 0x000c, RR count = 0
    Name      "(7)smtpgw1(3)gov(2)on(2)ca(0)"
      QTYPE   AAAA (28)
      QCLASS  1
    ANSWER SECTION:
      empty
    AUTHORITY SECTION:
      empty
    ADDITIONAL SECTION:
    Offset = 0x0023, RR count = 0
    Name      "(0)"
      TYPE   OPT  (41)
      CLASS  4000
      TTL    0
      DLEN   0
      DATA   
        Buffer Size  = 4000
        Rcode Ext    = 0
        Rcode Full   = 2
        Version      = 0
        Flags        = 0

Other things of note:

  • One of the networks has native IPv6 internet access, the other does not (but IPv6 stack is enabled on the servers with default settings). Doesn't seem to be an IPv6 network issue
  • It doesn't affect all domains. For example dig @192.168.5.5 -t AAAA serverfault.com returns NOERROR, and no results. Same thing for google.com returns google's IPv6 addresses properly.
  • Tried installing hotfix from KB3014171, made no difference.
  • The update from KB3004539 is already installed.

Edit Nov 7, 2015

I've setup another non-domain joined Server 2012R2 machine, and installed DNS server role, and tested with the command nslookup -type=aaaa smtpgw1.gov.on.ca localhost. It does NOT have the same issues.

Both VMs are on the same host, and same network, so that eliminates any network/firewall issues. It's now down to either patch level or being a domain member/domain controller that makes the difference.

Edit Nov 8, 2015

Applied all updates, made no difference. Went through to double check if there were any configuration differences between my new test server and my domain controller's DNS settings, and there are – the domain controller had forwarders setup.

Now, I'm sure I tried with forwarders and without in my initial tests, but I only tried it using dig from a linux machine. I do get slightly different results with and without forwarders setup (tried with Google, OpenDNS, 4.2.2.1, and my ISP DNS servers) when I use nslookup on a windows machine.

With a forwarder set, I get Server failed.

Without a forwarder (so it uses root DNS servers), I get No IPv6 address (AAAA) records available for smtpgw1.gov.on.ca.

But that's still not the same as what I get for other domains that don't have IPv6 records – nslookup on windows just returns no results for other domains.

With or without forwarders, dig still shows SERVFAIL for that name when querying my windows DNS server.

There IS a small difference between the problem domain and other ones that seems relevant, even when I don't involve my windows DNS server:

dig -t aaaa @8.8.8.8 smtpgw1.gov.on.ca has no answers, and does not have an authority section.

dig -t aaaa @8.8.8.8 serverfault.com returns no answers, but does have an authority section. So do most other domains I try, no matter what resolver I use.

So why is that authority section missing, and why does Windows DNS server treat it as a failure when other DNS servers don't?

Best Answer

I've looked into the network tace some more and done some reading. The reqest for the AAAA record, when non-existant, returns an SOA. Turns out the SOA is for a different domain that that being requested. I suspect that's why Windows is rejecting the response. Request AAAA for mx.atomwide.com. Response SOA for lgfl.org.uk. I will see if we can make some progress with this information. EDIT: Just for future reference, temporarily turning off "Secure cache against pollution" will allow the query to succeed. Not ideal, but proves the issue is with a dodgy DNS record. RFC4074 is also a good referemce - Intro and Section.

Related Topic