How to find which server in the DNS ‘chain’ is returning NXDOMAIN

domain-name-system

For almost 48 hours now we've been experiencing sporadic DNS outages for A and CNAME records. Have mostly been tracking the issue via ping and nslookup on Windows.

Lookups for 'www' (A), 'img1' (CNAME), 'store' (A) have come back as not found (windows ping or nslookup says it just cant find the host) – andon one online DNS tester I even saw an NXDOMAIN response.

I'm pretty sure that somewhere in the DNS 'chain' there is a cached NXDOMAIN response coming back thats still getting cached after 46 hours now.

I've even seen a case where – using nslookup – i've done a lookup on a CNAME record img1.example.com 10 times within 5 seconds and had negative and positive responses from the same verizon DNS server within a second.

Like I said this has happened for 48 hours now. The 'outage' occurs only briefly for a few minutes, but has been seen from at least 4 differnet geographical locations/networks.

I thought the bad record would have cleared itself out by now, but I'm hoping that finding the offending DNS server I can at least try to contact them – or find out whose fault it is.

Answers to obvious questions

  • DNS currently godaddy, has not been changed at all
  • domain has been active with DNS on godaddys hosted DNS (ns41.domaincontrol.com) for 3 years
  • Problem observed on several differnet networks, verizon DSL, comcast cable, verizon EVDO, site24x7 website
  • even happening with CNAME records to amazon A3 (i.e. 100% not a webserver problem and 100% DNS problem)
  • I'm not an expert, but the problem confirmed by two people that know more than i do. one thinks the most likely issue is a cached NXDOMAIN response somewhere.

Should we just wait up to 4 days before changing DNS providers?
Is there a tool of some sort to trace where the DNS is coming from and find the actual server which is caching the NXDOMAIN response – or perhaps a service to just test hundreds/thousands of DNS servers for their responses?

Best Answer

I think you may have a conceptual issue with how DNS works.

Only DNS servers performing recursive resolution cache lookups. The DNS servers that the affected users on "verizon DSL, comcast cable, verizon EVDO, site24x7 website" are using are the ones caching lookups.

The root DNS servers, .com servers, and the servers authoritative for your domain aren't caching lookups, because they're not providing recursive resolution service.

It's possible (likely, actually, from what I'm seeing in Google searches) that GoDaddy is sporadically returning NXDOMAINs for your domain, and those NXDOMAINs are being cached by recursive resolvers. (Per RFC2308, they should be cached, at most, either the TTL for the zone as specified in the SOA, or the SOA minimum-- whichever is least.)

Apparently, GoDaddy's "free" DNS service isn't too highly regarded. I don't use it, personally, so I can't comment on it.

There is no central "list" of DNS servers providing recursive resolution for you to "test against". (I have one here in my house, and I could spin a few more up on VMs if I needed to...) You need a reliable provider to be authoritative for your domain, and you just have to hope that everybody else in the world honors TTLs and acts as "good DNS citizens".


Edit:

"Recursive resolution" is the process by which the a DNS server resolves a record for which it is not authoritative. The process starts with the root DNS servers, and proceeds recursively (that is, a process that loops back on itself) through all the authoritative DNS servers for the domains specified in the query until the last DNS server is reached and the desired resource record (or a negative response) is returned.

For a three-level query, like "www.example.com", the following occurs (I am leaving out the fact that, all along the way, the ISP DNS server is checking its cache in lieu of issuing queries to remote DNS servers and putting the results it receives into its cache, to make this clearer and a bit more simplistic):

  • Your PC issues a query to your specified DNS server (at your ISP, for example).

  • The ISP DNS server verifies that it doesn't have a response in cache, and then queries one of the root DNS servers.

  • The root DNS server, only being authoritative for the root, responds with a list of DNS servers authoritative for the gTLD specified in the query (.com, .net, .tv, .fu, etc). The protocol continues as such, w/ the full query always being sent to each successive DNS server throughout this process. Since it's not possible to know which DNS server will be authoritative for any given query and we want to minimize the number of round-trips, we always send the full domain in each query.

  • The ISP DNS server queries one of the DNS servers returned as authoritative for the gTLD specified.

  • The gTLD DNS server, being authoritative for the second-level domain (example, microsoft.com, example.com, etc) only, responds with a list of DNS servers authoritative for the second-level domain.

  • The ISP DNS server queries one of the DNS servers returned as authoritative for the second-level domain.

  • The DNS server authoritative for the second-level, being for the third-level domain (www.microsoft.com, ftp.example.com, etc), domain returns the record requested.

  • The ISP DNS server returns the record your PC queried back to your PC.

Typically ISPs offer recursive resolution services to their Customers. The DNS servers at hosting providers that are authoritative for Customer hosted domains generally don't provide recursive service (and will return the root servers if queried for domains they aren't authoritative for).