This is not directly a DNS problem, it's a network routing problem between some parts of the internet and the DNS servers for serverfault.com. Since the nameservers can't be reached the domain stops resolving.
As far as I can tell the routing problem is on the (Global Crossing?) router with IP address 204.245.39.50
.
As shown by @radius, packets to ns52 (as used by stackoverflow.com) pass from here to 208.109.115.121
and from there work correctly. However packets to ns22 go instead to 208.109.115.201
.
Since those two addresses are both in the same /24
and the corresponding BGP announcement is also for a /24
this shouldn't happen.
I've done traceroutes via my network which ultimately uses MFN Above.net instead of Global Crossing to get to GoDaddy and there's no sign of any routing trickery below the /24
level - both name servers have identical traceroutes from here.
The only times I've ever seen something like this it was broken Cisco Express Forwarding (CEF). This is a hardware level cache used to accelerate packet routing. Unfortunately just occasionally it gets out of sync with the real routing table, and tries to forward packets via the wrong interface. CEF entries can go down to the /32
level even if the underlying routing table entry is for a /24
. It's tricky to find these sorts of problems, but once identified they're normally easy to fix.
I've e-mailed GC and also tried to speak to them, but they won't create a ticket for non-customers. If any of you are a customer of GC, please try and report this...
UPDATE at 10:38 UTC As Jeff has noted the problem has now cleared. Traceroutes to both servers mentioned above now go via the 208.109.115.121
next hop.
Myth? Kind of.
There are 2 aspects that people often confuse. If you make a change to your domain name with your domain name registrar, for example changing the name servers, that is pushed to the name servers for your TLD (.com, .ca, .fr, etc). That's where the propagation comes into play. In years past, that could take hours or even days waiting for the registrar to take the information you provided, push that to their deployment servers which would update the TLD root servers twice per day. That's improved rapidly over the years and often times changes made to your domain name take take effect nearly immediately.
On the other hand, if you make a change to your DNS zone, like adding an A record or an MX change, that should take 'up to' as long as the TTL setting to be updated everywhere. That's not really propagation though, it's caching. Microsoft DNS, for example, defaults to 1 hour TTL.
With the caching, if you happen to use the domain name just before making a change, and the TTL is 1 hour, then it will take an hour for it to be updated. However, if you haven't tested anything with the domain name just prior to the change, then your change will be immediate for you. (i.e. add a new A record that you haven't tested with yet, and it will take effect immediately).
So, nowadays almost all changes will take affect within an hour (or whatever your DNS TTL is set for). The only exceptions are if a DNS server doesn't honor the TTL (spammers often don't), or if your domain name registrar's servers aren't updating properly to the internet and you make a registrar level change. That isn't often though.
Best Answer
"DNS propagation" isn't a real phenomenon, per se. Rather, it is the manifest effect of the caching functionality specified in the DNS protocol. Saying that changes "propagate" between DNS servers is a convenient falsehood that's, arguably, easier to explain to non-technical users than describing all of the details of the DNS protocol. It's not really how the protocol works, though.
Recursive DNS servers make queries on behalf of clients. Recursive DNS servers, typically run by ISPs or IT departments, are used by client computers to resolve names of Internet resources. Recursive DNS servers cache the results of queries they make to improve efficiency. Queries for already-cached information can be answered without making any additional queries. The duration, in seconds, that a result is cached is supposed to be based on a configurable value called the Time To Live (TTL). This value is specified by the authoritative DNS server for the record queried.
There is no one answer to all the questions being asked because DNS is a distributed protocol. The behavior of DNS depends on the configuration of the authoritative DNS server for a given record, the configuration of recursive DNS servers making queries on behalf of client computers, and DNS caching functionality built-in to the client computers' operating systems.
It's good practice to specify a TTL value short enough to accommodate neecssary day-to-day changes to DNS records, but long enough so to create a "win" in caching (i.e. not so short as to age-out of cache too quickly to provide any efficiency improvement). Employing a balanced strategy with TTL values results in a "win" for everyone. It reduces both the load and bandwidth utilization for the authoritative DNS servers for a given domain, the root servers, and the TLD servers. It reduces the upstream bandwidth utilization for the operator of the recursive DNS server. It results in quicker query responses for client computers.
As a DNS record's TTL is set lower load and bandwidth utilization on the authoritative DNS servers will increase because recursive DNS servers will not be able to cache the result for a long duration. As a record's TTL is higher changes to records will not appear to "take effect" quickly because client computers will continue to receive cached results stored on their recursive DNS servers. Setting the optimal TTL comes down to a balancing act between utilization and ability to change records quickly and see those changes reflected on clients.
It is worth noting that some ISPs are abusive and ignore the TTL values specified by the authoritative DNS servers (substituting their own administrative override, which is a violation of RFC). There's nothing to be done about this, from a technical perspective. If the operators of abusive DNS servers can be located complaints to their systems administrators might result in their implementing best practices (arguably what amounts to common sense for any network engineer familiar with DNS). This particular type of abuse isn't a technical problem.
If everybody "plays by the rules" changes to DNS records can "take effect" very quickly. In the case of changing the IP address assigned to an "A" record, for example, an exponential backoff of the TTL value would be performed, leading up to the time the change will be made. The TTL might start at 1 day, for example, and be decreased to 12 hours for a 24 hour period, then 6 hours for a 12 hour period, 3 hours for a 6 hour period, etc, down to some suitably small interval. Once the TTL has been backed-off the record can be changed and the TTL brought back up to the desired value for day-to-day operations. (It is not necessary to use an exponential backoff, however this strategy minimizes the time the record will have a low TTL and decreases load on the authoritative DNS server.)
After making a DNS record change logs should be monitored for access attempts being made as a result of the old DNS record. In the example of changing an "A" record to refer to a new IP address a server should remain present at the old IP address to handle access attempts resulting from client computers still using the old "A" record. Once access attempts based on the old record have reached an acceptably low level the old IP address can be disused. If the requests related to an old record are not abating quickly it is possible that (as described above) a recursive DNS server is ignoring the authoritative TTL. Knowing the source IP address of an access attempt, however, does not provide direct information as to the recursive DNS server responsible for supplying an old record. If the IP addresses of errant access attempts are all related to a single ISP it may be possible to locate the offending DNS server and contact its operator.
Personally, I've seen changes "take effect" immediately, in a few hours, and in some cases with a particular brain-damaged ISP, after several days. Doing a backoff of your TTL and being mindful of how the process works will increase your changes for success, but you can't ever be sure what some well-meaning idiot might be doing with their recursive DNS servers.