If a zone has 6 NS records
When a DNS resolver lookup a domain/zone for an authoritative name server, does it take in all 6 records and cycle through them ?
If a resolver use the 1st NS server and cached it according to its TTL – when that authoritative name server is not responding, does the resolver still honor the TTL of the NS record ?
As illustrated in this writeup from imperva – it seems like even if the authoritative nameserver is not responding – the resolver will still try to use it until its TTL expires – how true it that ?
Basically, in those cases where websites had multiple NS records, the resolution between them was impeded by the very way that DNS resolvers work. The resolver could have tried to reach the inactive Dyn server so long as the resolver’s cached NS-record was current, which would be true until the TTL of the NS-record expires
Does that mean i need to set short TTL for NS records ?
Any advise on how resolver DNS works with non-responsive NS and its TTL ?
Thank you
Best Answer
Yes, a proper recursive nameserver takes into account all nameservers and will try to query each time later the fastest one.
A rough algorithm is kind of:
ns3
might be the faster one today for your specific vantage point, but maybe tomorrow it isns5
instead; so you have to use the fastest one, but not always, just to make sure to be able to automatically discover any other one faster than the one you believe to be the fastest right now.Stop here. Records are coming in a set, not a list. That is there is no inherent order in the DNS. Of course there is an order in the wire or display representation, but it does not come from the protocol.
Records sets are bags: you get records, without orders. In fact, you can see that many nameservers, for the exact same query, if there are multiple records in the reply, will order the records differently each time you query, exactly to combat client systems that would only take into account the first item and disregard the others.
See algorithm above: if one of the nameservers in the
NS
set is not responding, you can consider it to be the same as "replying as slowest from any other one". The client DNS has timeouts so it won't wait infinitely but mark this specific nameserver has too slow, and will switch to other ones. So at first time you incur a penalty because the system has to try to contact that nameserver, wait a little (few seconds), retry and at some point stop using that nameserver. After that ramp, it will use the other nameservers and things will be fast. But the first time you have to discover a given nameserver is slow/not responding by really trying to contact it, you can't infer the problem without trying.Maybe, but it is mostly irrelevant. Why? Because your
NS
records are published in the parent zone of your domain, to ensure the DNS delegation. They are published there with a TTL of course, as all records have a TTL attached to them, but they are published in a zone you don't control, hence you can NOT choose their TTL values!(There is a complicated/not completely finished discussion here about those records, like
NS
that exist in two parts: the parent and the child, with the question "which one is really authoritative"? If the parent has a TTL of 1 week onNS
records and you in your zone the sameNS
records have a TTL of 1 second, what should the recursive nameserver do? One might arrive to the conclusion that normally the child part of the delegation IS authoritative, so 1 second wins here; in practice multiple DNS implementations are "parent-centric" that is they use the data at the parent side, so 1 week wins there)TTLs are always a trade-offs. Once known some people are immediately jumping to the conclusion that things work better with very low TTLs... which is true for some cases and not so much for others. Caches are good, if they were not there (aka: not using big enough TTLs) you are not resilient against any small problems, that would make everything vanish because caches would have expired the names already.
Also the TTL value has no (or little) impact for the algorithm above in cycling along all nameservers, trying with timeout and converging on the fastest one.
So if you look at what happens in TLD nameservers (that host
NS
records for all domains under that TLD) or in various recommendations, you will often seeNS
records with a TTL of 1 or 2 days.Each resolver does its own :-) This is not really specified by the protocol, it is an implementation detail. You can study source code for the one you can install, but probably won't be able to gather details about that from big public recursive DNS providers.
You can find more details though here:
RFC 1034 §5.3.3 does give some information too (note also that it takes into account one case you forgot: a given nameserver may have multiple IP addresses - today even it should always be the case, with one IPv4 and one IPv6 - and there is no guarantee you get results in the same amount of time with each):
RFC 1035 §7.2 has this to say:
Also to finish and more specifically about this:
This article you reference talks about the issue that happened for people using Dyn nameservers, when there was an outage. Then, yes, if you use only Dyn nameservers you have a problem. As even if you change your zone to use other ones, the
NS
records TTL means your change won't be seen immediately. But that in reality does not say a lot about TTLs but just says a lot about DNS management: if you want to be resilient, for important zones, do not use a single DNS providers, but multiple ones (which of course mandates some coordination between them you can not just arbitrarily mix and match provider X and Y, and it is even more complicated if you through DNSSEC into the mix, but is possible). That way, exactly because of the algorithm quickly drafted above, even if 2 out of 5 let us say nameserver are failing to reply completely because this specific provider has a problem, the other will take the load and make your domain work. There might be an extra delay at each new query for visitors (because all recursive nameservers can not immediately understand that they have to switch to specific nameservers because 2 out of 5 are down), and also more delays because the other 3 are overwhelmed with more queries than normal (the DNS is load balancing by default so in theory each nameserver gets roughly the same volume of queries), but still a reply will be given.PS: not asked for, but as it is sometimes not clear, all records in a given recordset have to have the same TTL. The TTL is per record, but needs to be the same in a given recordset, which means for a given tuple of (name, record type) [and class, but no one uses anything besides
IN
as class]