DNS Round Robin: Do browsers stick to one IP as long as it is online

domain-name-systemfailoverhigh-availabilityround-robin

How do most browsers behave if they get multiple A-records from the DNS server? Do the stick to one IP as long as it is reachable (and only use another if the IP is down)? Or do they switch all the time for no reason?

If the majority current browsers stick to one IP, DNS-RR would be enough for me as a simple failover solution.

Best Answer

Each browser has it's own method of handling round-robin DNS, I've spent some time today researching this problem and will continue to update my answer as I find proof of implementation which will limit my answers to browsers that expose their behavior.

Google Chrome

Google Chrome (v58 used) will request all host entries for an address (A, AAAA, CNAME) and put them into an array (address_list). Chrome will then attempt to open a socket on each IP address in order from first to last, chrome will not attempt the fastest or closest IP, it assumes the first IP (given by your upstream dns resolvers) is the best IP. In my tests bind and windows dns servers give a different order of IPs per lookup, giving what seems like 50/50 split in bandwidth to each IP. This functionality is exposed in chrome://net-internals/#events&q=type:SOCKET%20is:active

Curl (libcurl/7.54.0)

Curl also has this fail-over function but the --connect-timeout is much longer than the default in chrome, chrome fails over immediately, Curl does not. If you use libcurl and want to survive a round-robin dns instance where one IP fails, (works in chrome but not in code) be sure to specify this value lower.

DEFAULT_CONNECT_TIMEOUT:0 made me think this wasn't possible with curl.

* After 149990ms connect time, move on!

On both browsers, the IP was not sticky, they followed the TTL given in DNS and once that ttl expired (chrome maintains this internally, curl asks on each request), the ip selection is performed each time as described above.

What does this mean? DNS-RR is ok for some systems, but it is not designed for failover. You should expect that all results from the DNS looking are (a source of truth) valid and available to serve traffic. There are many ways to ensure IP availability, such as virtual float IPs, BGP/Routing tricks, etc. Use them.

All tests performed in IPv4 only environment, will return with dual-stack results once enough infrastructure is available to test.

I speculate these changes are a side-effect of the IPv6-Fallback RFC Happy Eyeballs

Update A useful consideration, RR DNS can only assist with load balancing, not application failures, if one of your nodes has a 503 you will serve 40-60% if your traffic 503s. The assumption is made that all IPs listed are valid working endpoints if reachable