Bind DNS Recursion Slow

binddomain-name-systemperformancerecursiveslow-connection

We have just setup a recursive DNS server using the latest stable release of Bind 9.10

We are finding that recursive DNS lookups are quite slow. Anywhere from 1 – 3 seconds. Once the lookup is in cache, DNS resolves in a matter of milliseconds as expected.

We are utilising ROOT hints for the recursive lookups and this seems to be where the slowness is coming from. If we configure a forwarder the DNS resolution comes down to a sensible recursion time of 100 – 300ms.

For the service we are setting up, I don't want to rely on forwarders, I would prefer to use root hints.

Here is the main config from our named.conf file. Any pointers to help improving the performance would be great.

options{
allow-recursion  { any; };
allow-query-cache  { any; };
allow-query  { any; };

listen-on  port 53  { any; };
listen-on-v6  port 53  { any; };

dnssec-enable yes;
dnssec-validation yes;
dnssec-lookaside auto;

zone-statistics yes;
max-cache-ttl 3600;
max-ncache-ttl 3600;

/* Path to ISC DLV key */
bindkeys-file "/etc/named.iscdlv.key";
managed-keys-directory "/var/named/dynamic";

directory  "/var/named";
dump-file  "/var/named/data/cache_dump.db";
statistics-file  "/var/named/stats/named_stats.txt";
memstatistics-file  "/var/named/stats/named_mem_stats.txt";

rate-limit {
    responses-per-second 10;
    log-only yes;
};

prefetch 5;};

zone "." {
type hint;
file "named.ca";};

include "/var/named/conf/logging.conf";

Best Answer

We found the issue. It was a NIC hardware offloading issue.

Running tcpdump -vvv -s 0 -l -n port 53 found a handful of [bad udp cksum 6279!] errors for each DNS query.

A little browse on Google pointed me in the right direction. As it turns out, due to our CentOS system running as VM on XenServer (similar issues reported with VMWare etc) NIC hardware offloading is enabled by default.

Running ethtool -k eth0 | grep on showed the following

x-checksumming: on
tx-checksum-ipv4: on
scatter-gather: on
tx-scatter-gather: on
tcp-segmentation-offload: off
tx-tcp-segmentation: off
tx-tcp-ecn-segmentation: off
tx-tcp6-segmentation: off
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
tx-gso-robust: on [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]

Running ethtool -K eth0 tx off rx off disabled TCP TX offloading. I restarted the networking service for good measure

service network restart

and tested BIND. We are now getting very speedy response times from BIND

dig centos.org

; <<>> DiG 9.10.2-P4-RedHat-9.10.2-P4.el6 <<>> centos.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 61933
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4000
;; QUESTION SECTION:
;centos.org.INA

;; ANSWER SECTION:
centos.org.60INA85.12.30.227

;; Query time: 268 msec
;; SERVER: 192.168.10.25#53(192.168.10.25)
;; WHEN: Thu Sep 17 08:25:39 AEST 2015
;; MSG SIZE  rcvd: 55