Windows 2016 DNS Server: not using forwarder when recursively resolving CNAME in delegated zone

domain-name-systemwindowswindows-server-2016

I don't think I'm going mad here…

Our AD domain controllers (Server 2016) are the DNS servers for foo.example. Within that, we have a delegation, r53.foo.example, which points out to the nameservers for that zone in Amazon Route 53.

One of the records in the Route 53 zone is a CNAME to an EC2 instance's public DNS name, i.e.

bar.r53.foo.example IN A ec2-1-2-3-4.us-west-1.compute.amazonaws.com.

The Windows DNS server is set to use Google public DNS servers as forwarders, and root hints are disabled. Recursion is enabled.

From a client, if I query ec2-1-2-3-4.us-west-1.compute.amazonaws.com, it resolves correctly. Then, clear all the DNS caches.

If I now query bar.r53.foo.example, the Windows DNS server will query the delegated zone's DNS server (because of the delegation), and get the CNAME result, but that upstream server doesn't recursively resolve the A record.

Windows then sends an A record query to the delegated zone's nameserver – and not the NS for us-west-1.compute.amazonaws.com, and gets a REFUSED response.

I would have expected it to either use the configured forwarders (because ec2-1-2-3-4.us-west-1.compute.amazonaws.com is not in a zone it hosts authoritatively nor a delegated zone), or to at least recursively resolve using the NS for us-west-1.compute.amazonaws.com. Instead, it leaves clients without a full resolution.

If the ec2-1-2-3-4.us-west-1.compute.amazonaws.com IN A 1.2.3.4 record happens to already be in the server's cache, then the client query resolves completely, but obviously this isn't guaranteed.

This smells like a bug, but maybe I'm missing something?

Edit to add: this is only true under Server 2016 DNS server. Same config under 2012 R2 gives the expected behaviour.

Best Answer

Since it looks like a few people are hitting this, our workaround:

We run a small Ubuntu VM running ISC Bind. This acts as the recursive resolver, and is used as the DNS server by all client PCs. That has forwarders configured to our ISP's DNS servers for "external" zones, and slave zones for the AD domain:

zone "internal.domain" { type forward; forward only; forwarders { domain.controller.ip; }; };

That way the Windows DNS server only acts as an authoritative resolver, and the recursive resolution is handled by Bind.

(If you want high availability, you can then use VRRP or similar to cluster a pair of recursive resolvers - we run our primary on our regular virtualisation infrastructure, and secondary on a Raspberry Pi, because we can...)