Linux bind intermittent dns resolution problems


I have intermittent problems with my dns resolution. Sometimes connection fails (and dig don't resolve too) to some nodes, other times to others, and WITHOUT ANY CONFIG CHANGE other times each one resolve correctly to IP.
The offending names are:

How can I check my bind config on my dns servers ( for validity?
When some name is failing, what tools among dig I have to test where is trying to resolve and why is not doing it?

Best Answer

It looks like you have a surprising number of authoritative servers for the zone:

$ dig +short NS

As all of these are published in NS records, each of these nameservers is supposed to know everything there is to know about the zone (except possible sub-zones, but each of these servers must at least have the delegation records for those). Each query for domain information will be sent to one of these DNS servers effectively at random, in order to balance the load of the DNS servers.

However, it appears that only a few of those will actually be able to provide answers for me:

$ for i in $(seq 0 4); do echo "ns$i.fitsaas: "; dig +short @ns$; done

$ for i in $(seq 1 4); do echo "ns$i.alpnames: "; dig +short @ns$; done

The results for the rest of your problem names are similar: only and seem ready to provide answers about these names to the world.

By replacing +short in the dig command options with +noall +answer +comments we can get more information on what's amiss.

$ for i in $(seq 0 4); do echo "ns$i.fitsaas: "; dig +noall +answer +comments @ns$ || echo "error"; done
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 62501
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

status: REFUSED indicates this server refused to answer my query. This is not a showstopper, as this is clearly an error and a DNS resolver will just ask another authoritative DNS server. The other uncooperative ns* nameservers seem to respond the same way.

However, the ns* nameservers will answer differently:

$ for i in $(seq 1 4); do echo "ns$i.alpnames: "; dig +noall +answer +comments @ns$; done
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 23895
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available

status: NXDOMAIN means "I am sure such a DNS name doesn't exist." This is a showstopper: when an DNS server that is authoritative for a given zone says that a particular name does not exist in that zone, that is not an error: it is a valid answer, although a negative one. Negative answers are even cacheable, although usually for shorter times than positive ones.

Once a resolver gets a negative answer from a DNS server that is authoritative for the requested domain, it must take it as gospel truth: there is no such name. Since all authoritative servers for a given zone are supposed to have full and current knowledge of the zone, asking a second opinion from another authoritative server is supposed to be just a waste of time and resources.

Are the ns* servers supposed to be authoritative for your zone? If not, contact your DNS registrar and have them removed from the delegation NS records. Since these delegation records are at the level of the .com TLD, you cannot do this just by editing the NS records in your authoritative DNS server(s).

As, and seem to be refusing external queries, you should consider removing those from the delegation NS records too - they seem to be unwilling to serve the public, so there should be no point in mentioning them in the delegation.

Related Topic