Linux – After a reboot, BIND is unable to resolve any .org or .info domain name

domain-name-systemlinux

Attempting to resolve on the BIND server itself (a CentOS 6.2 server), dig returns an empty "A" record for any domain in the .org or .info tld.
# dig @localhost text-lb.eqiad.wikimedia.org

; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.10.rc1.el6_3.2 <<>> @localhost text-lb.eqiad.wikimedia.org
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 58440
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;text-lb.eqiad.wikimedia.org.   IN      A

;; Query time: 156 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Tue Jan  7 06:26:24 2014
;; MSG SIZE  rcvd: 45

However, when I tcpdump the port, this is what I see in the dump (I actually had to wade through all the CNAME stuff first, I didn't include that):

06:24:20.772293 IP services1i.box11.org.46014 > ns1.wikimedia.org.domain: 65338% [1au] A? text-lb.eqiad.wikimedia.org. (56)
06:24:20.864571 IP ns1.wikimedia.org.domain > services1i.box11.org.46014: 65338*- 1/3/5 A 208.80.154.224 (202)

It is clearly returning an A record for the domain, but the output doesn't include that A record. When I query against the google dns it all works (of course):

# dig @8.8.8.8 text-lb.eqiad.wikimedia.org

; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.10.rc1.el6_3.2 <<>> @8.8.8.8 text-lb.eqiad.wikimedia.org
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 17362
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;text-lb.eqiad.wikimedia.org.   IN      A

;; ANSWER SECTION:
text-lb.eqiad.wikimedia.org. 3489 IN    A       208.80.154.224

;; Query time: 61 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Tue Jan  7 06:26:16 2014
;; MSG SIZE  rcvd: 61

I don't believe I changed any configuration, but this did start happening after a reboot, so it's possible there was some odd config change lurking around (I have restarted named several times without any issue, and I'm pretty sure I haven't changed any configs since restarting named).

What I'm not getting here is why the server appears to be making the request for the A record, and it appears to be getting an answer but not returning that answer to the client. Query Logs show this one lonely entry:

07-Jan-2014 06:30:59.766 client 127.0.0.1#60966: view internal: query: text-lb.eqiad.wikimedia.org IN A + (127.0.0.1)

This was an example domain, same problem occurs with any .org or .info. Strangely, .com seems to work just fine.

Best Answer

Thanks to Nick for the orientation. There were no log entries that helped, but after snooping around the server, I discovered disabling dnssec explicitly caused the service to work. I then looked at the time/date on the system and found it to be almost an hour off track.. ntpd had failed to start and the clock skewed somehow. Syncing the clock to the correct time allowed BIND to return A records correctly. I restored the dnssec settings to the way they were (accept the default) and the system continued to work.

SOo... the resolution was to sync the clock so it was near accurate time.

Related Topic