CentOS 7. My problem is a seemingly common issue where nslookup
can resolve a host, but ping
can't. However, the common answers like messing with avahi or /etc/nsswitch.conf don't help because my VPS is running neither Avahi nor NetworkManager. (in other words, I can break /etc/nsswitch.conf
by setting hosts: files
and ping
continues to work)
/etc/resolv.conf
is as follows:
nameserver 10.44.13.246
nameserver 10.32.72.88
nameserver 10.32.72.86
Where the first nameserver points to an instance of dnsmasq
that's running on another of my VPSes, and the last two are the hosting provider's DNS servers. I expect them to be queried in order (the last two are simply last-resort fallbacks).
Now, for any of the hosts defined in that dnsmasq
instance, nslookup
always works, and ping
works some of the time — a host will resolve properly, then break, then a few minutes later it will be fine again. However, if I remove the upstream DNS servers in etc/resolv.conf
like this,
nameserver 10.44.13.246
#nameserver 10.32.72.88
#nameserver 10.32.72.86
then ping
immediately starts to work 100% of the time. This directly contradicts the resolv.conf docs, which say that in the absence of an option rotate
directive, the servers are queried in order until one sends a response.
nscd
is running and is being hit, because I can see the cache hit/miss counters go up for these problematic queries.
How can I resolve this?
Best Answer
I don't have a direct answer to the larger question but answers for some distinct parts of it.
Regarding
ping
vsnslookup
It's worth noting that
ping
is just an example of a regular program which uses the OS resolver library (ie,getaddrinfo
/gethostbyname
calls) whilenslookup
(as well asdig
, etc) are DNS client programs making DNS queries of their own, rather than using the resolver library, they just so happen to pick up their default server from the configuration file for the system resolver as a matter of convenience.What this means is that
nslookup
is bad for testing how the system resolver behaves (ieresolv.conf
,nsswitch.conf
, etc), while egping
is bad for testing DNS.It can be noted that in Linux-land I would consider
getent ahosts
(eggetent ahosts www.example.com
) a better choice for testing the resolver behavior, anddig
to be much preferrable overnslookup
for testing DNS.Regarding what you can do to see what is happening
As was suggested by Hangin on in quiet desperation, you may want to use
strace
(maybe alsoltrace
for a higher level view) and I would suggest using it withgetent ahosts
rather thanping
to not get all the noise of what isping
's actual purpose, while you're trying to observe what is just a side-effect.getent ahosts
just does this one thing that you're trying to investigate.Regarding what to have in resolv.conf
What you're saying about things "breaking" when the "wrong" server is queried makes me wonder why you are putting all of those servers in
resolv.conf
in the first place. It's generally really not a good idea to put servers with different behavior (different in some way that is actually significant to your use) all in the list.