Linux keeps retrying failed DNS server

domain-name-systemlinuxresolv.conf

Whenever one of the servers in /etc/resolv.conf is unreachable, Linux/glibc/whatever isn't smart enough not to retry it for a while. This results in a lot of services becoming unavailable, because a lot of them do reverse lookups on all incoming connections (like SSH), which will hang for the time-out of the first DNS server query.

How can I make my Ubuntu boxes be smart about the DNS servers it uses? I could hack a bash script that runs every minute that inserts a REJECT rule into iptables for the servers that don't respond to dig queries, but I'd rather not do it that way…

I'm told that Windows does this properly, BTW.

Edit: I worked around it a little bit by putting this in /etc/resolv.conf (or /etc/resolvconf/resolv.conf.d/base):

options timeout:2 rotate

Still not perfect, but more workable.

Best Answer

Why are the DNS servers becoming unavailable? That's the issue we should focus on fixing...

You should omit the rotate directive if you want to have a deterministic retry order. rotate basically gives you round-robin lookups, which can have undesirable results in your situation.

My DNS /etc/resolv.conf tends to look like:

search blah.net client.blah.net
options timeout 1
nameserver 172.16.2.14
nameserver 172.16.2.18

Short of that, you do have the option of using a caching DNS service on your local machine, or even enabling the Name Server Caching Daemon (nscd). That will help buffer the delays that come with unreliable DNS resolvers.