DNS Resolv.conf Settings for Primary DNS Server Failure

domain-name-systemlinuxresolv.conf

I'm currently the administrator of some RHEL Linux machines, in a mixed network. Our DNS servers are Windows AD controllers. As such, they occasionally need to come down for maintenance. (eg: patching) This means that at some point, the primary DNS controller for my Linux machines will be unreachable.

In the Windows world, this is handled pretty well. When DNS queries to the primary fail, Windows clients stop using it for 15 minutes. So, barring the initial hiccup, they all putt along pretty smoothly. But Linux keeps trying the same (failed) primary server. By default it will wait at least 5 seconds before trying a secondary server. This translates into EVERYTHING taking a long time, and even applications timing out if there are a good number of DNS lookups.

So, I'm looking into making my server more robust. My current plan is to A) modify resolv.conf to only wait 1/2 a second for a response, and not retry. and B) possibly make some strategic entries to /etc/hosts so that major servers are still reachable quickly.

All that being said, I'd love to have a better solution. Alternately, I'd like to hear what other people are doing with their setups. Or just theoretical "Your idea is good/bad, here's why."

–Christopher Karel

Best Answer

You might look at using dnsmasq instead of relying solely on the resolver library - dnsmasq queries the upstream servers in parallel, not a serial fashion, so having one drop out shouldn't cause so many problems.