Linux bind intermittent dns resolution problems

bindlinux-networking

I have intermittent problems with my dns resolution. Sometimes connection fails (and dig don't resolve too) to some nodes, other times to others, and WITHOUT ANY CONFIG CHANGE other times each one resolve correctly to IP.
The offending names are:

dallas1.fitsaas.com
dallas2.fitsaas.com
uk1.fitsaas.com
uk2.fitsaas.com

How can I check my bind config on my dns servers (ns0-ns4.fitsaas.com) for validity?
When some name is failing, what tools among dig I have to test where is trying to resolve and why is not doing it?

Best Answer

It looks like you have a surprising number of authoritative servers for the fitsaas.com zone:

$ dig +short fitsaas.com NS
ns1.alpnames.com.
ns3.alpnames.com.
ns1.fitsaas.com.
ns3.fitsaas.com.
ns2.alpnames.com.
ns4.alpnames.com.
ns4.fitsaas.com.
ns2.fitsaas.com.
ns0.fitsaas.com.

As all of these are published in NS records, each of these nameservers is supposed to know everything there is to know about the fitsaas.com zone (except possible sub-zones, but each of these servers must at least have the delegation records for those). Each query for fitsaas.com domain information will be sent to one of these DNS servers effectively at random, in order to balance the load of the DNS servers.

However, it appears that only a few of those will actually be able to provide answers for me:

$ for i in $(seq 0 4); do echo "ns$i.fitsaas: "; dig +short dallas1.fitsaas.com @ns$i.fitsaas.com; done
ns0.fitsaas:
ns1.fitsaas:
ns2.fitsaas:
63.142.255.107
ns3.fitsaas:
63.142.255.107
ns4.fitsaas:

$ for i in $(seq 1 4); do echo "ns$i.alpnames: "; dig +short dallas1.fitsaas.com @ns$i.alpnames.com; done
ns1.alpnames:
ns2.alpnames:
ns3.alpnames:
ns4.alpnames:

The results for the rest of your problem names are similar: only ns2.fitsaas.com and ns3.fitsaas.com seem ready to provide answers about these names to the world.

By replacing +short in the dig command options with +noall +answer +comments we can get more information on what's amiss.

$ for i in $(seq 0 4); do echo "ns$i.fitsaas: "; dig +noall +answer +comments dallas1.fitsaas.com @ns$i.fitsaas.com || echo "error"; done
ns0.fitsaas:
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 62501
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
[...]

status: REFUSED indicates this server refused to answer my query. This is not a showstopper, as this is clearly an error and a DNS resolver will just ask another authoritative DNS server. The other uncooperative ns*.fitsaas.com nameservers seem to respond the same way.

However, the ns*.alpnames.com nameservers will answer differently:

$ for i in $(seq 1 4); do echo "ns$i.alpnames: "; dig +noall +answer +comments dallas1.fitsaas.com @ns$i.alpnames.com; done
ns1.alpnames:
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 23895
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available
[...]

status: NXDOMAIN means "I am sure such a DNS name doesn't exist." This is a showstopper: when an DNS server that is authoritative for a given zone says that a particular name does not exist in that zone, that is not an error: it is a valid answer, although a negative one. Negative answers are even cacheable, although usually for shorter times than positive ones.

Once a resolver gets a negative answer from a DNS server that is authoritative for the requested domain, it must take it as gospel truth: there is no such name. Since all authoritative servers for a given zone are supposed to have full and current knowledge of the zone, asking a second opinion from another authoritative server is supposed to be just a waste of time and resources.

Are the ns*.alpnames.com servers supposed to be authoritative for your zone? If not, contact your DNS registrar and have them removed from the delegation NS records. Since these delegation records are at the level of the .com TLD, you cannot do this just by editing the NS records in your authoritative DNS server(s).

As ns0.fitsaas.com, ns1.fitsaas.com and ns4.fitsaas.com seem to be refusing external queries, you should consider removing those from the delegation NS records too - they seem to be unwilling to serve the public, so there should be no point in mentioning them in the delegation.

Related Topic