Browser-based DNS failover using multiple A records

domain-name-systemfailover

It has recently come to my attention that setting up multiple A records for a hostname can be used not only for round-robin load-balancing but also for automatic failover.

So I tried testing it:

  1. I loaded a page from our domain
  2. Noted which of our servers had served the page
  3. Turned off the web server on that host
  4. Reloaded the page

And indeed the browser automatically tried a different server to load the page. This worked in Opera, Safari, IE, and Firefox. Only Chrome failed to try a different server.

But after leaving that server offline for a few minutes and looking at the access logs, I found that the number of requests to the other servers had not significantly increased. With 1 out of 3 servers offline, I had expected accesses to each of the remaining 2 servers to roughly increase by 50%, but instead I only saw 7-10%. That can only mean in-browser DNS failover does not work for the majority of browsers/visitors, which directly contradicts what I had just tested.

Does anyone have an idea what is up with browsers' DNS failover behavior? What possible reason could there be why automatic failover works for me but not the majority of our visitors?

edit: To make myself clear, I made absolutely no change to our DNS settings; there's no TTL or propagation issue here, it's all about how the client handles the multiple A records.

Best Answer

OK I am going to start by saying DNS is not a good failover system in any way, you need a reverse proxy or load balancer. There are several reasons why the experience is not the same. First of all in chrome it uses The OS to grab DNS info so that is dependent on the OS for the IPs, so the OS in this case might only give it one IP.

As far as the other browsers its highly dependent on how they do DNS to how it'll work. So the browser itself might decide to not try the other IPs or even try the same one several times depending on the response the DNS server has.

This brings us to the DNS server itself, most do not respect your TTL records and keep then how ever long it feels, meaning Users could get your old IP for quite a while...

Fourthly user experience, do you want users to have to refresh 3 or 4 times to get your website? Do you have any session or login based stuff on your site, what happens if the browser gets another IP in the middle of the session. If you really need HA and uptime you really need to consider doing it right,honestly or it will end up more fractured than using just one server.