Linux – Clients didn’t switch to secondary DNS server during fail over

domain-name-systemlinuxresolv.conf

I have two internal dns servers set up and all my servers have both of them in the resolv.conf Our main dns server went down and suddenly no server could see each other. I edited a few of the servers resolv.conf manually and committed out the first (down) dns server and that machine would instantly be able to ping again. What did I do wrong, does it not auto switch to the secondary dns server when it times out?

# File managed by puppet
nameserver 192.168.146.100
nameserver 192.168.159.101
;nameserver 72.14.188.5
domain example.com
search example.com

Best Answer

It's likely that the default timeout is too long and that apps are breaking as a result. Keep in mind that the resolver will go start with the first entry in /etc/resolv.conf -every- time it's called (notwithstanding cached entries).

Try adding something like "options timeout:.5" or similar (see the man page - http://linux.die.net/man/5/resolv.conf) to let the local resolver try alternate name servers sooner. Be careful of making this value too low, as some recursive lookups can legitimately take quite a while.