What does the Bind 9 query error log debug 2 qrysent, retry, restart numbers mean


I am trying to trouble shoot some DNS problems with BIND 9 when I have a cache miss on my recursive resolver.

I've enabled debug 2 logging for query errors and am getting the following:

01-Jun-2015 03:04:41.539 debug 1: client (www.theonion.com): query failed (SERVFAIL) for www.theonion.com/IN/A at query.c:7005

01-Jun-2015 03:04:41.539 debug 2: fetch completed at resolver.c:3194 for www.theonion.com/A in 10.000137: timed out/success [domain:theonion.com,referral:1,restart:3,qrysent:11,timeout:10,lame:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0]

Does anyone know exactly what this means. The first entry looks like it Failed.

The second is a timed out/success (which one is it — time our or success — or is it a success that happened after a timeout value)

What are the numbers in the debug 2 line. What is a referral, restart, querysent, etc. Did this try the same query 11 times, 10 timed out and 1 responded? What are there referrals and restarts?

Any Bind experts that can help me understand what is going on here?

Best Answer

The BIND ARM is your friend whenever you're doing anything of complexity with BIND. In particular, this is documented in the section on logging. The interpretation should be that BIND followed 1 referral, tried 3 times to reach all known nameservers, sent 11 queries in the process, and timed out on 10 out of 11 queries.


The number of referrals the resolver received throughout the resolution process. In the above example this is 2, which are most likely com and example.com.


The number of cycles that the resolver tried remote servers at the domain zone. In each cycle the resolver sends one query (possibly resending it, depending on the response) to each known name server of the domain zone.


The number of queries the resolver sent at the domain zone.


The number of timeouts since the resolver received the last response.


The number of lame servers the resolver detected at the domain zone. A server is detected to be lame either by an invalid response or as a result of lookup in BIND9's address database (ADB), where lame servers are cached.


The number of erroneous results that the resolver encountered in sending queries at the domain zone. One common case is the remote server is unreachable and the resolver receives an ICMP unreachable error message.


The number of unexpected responses (other than lame) to queries sent by the resolver at the domain zone.


Failures in finding remote server addresses of the domain zone in the ADB. One common case of this is that the remote server's name does not have any address records.


Failures of resolving remote server addresses. This is a total number of failures throughout the resolution process.


Failures of DNSSEC validation. Validation failures are counted throughout the resolution process (not limited to the domain zone), but should only happen in domain.