I enabled DNSSEC on my primary domain about a week ago. It's not a major website or anything — just my personal domain name that I use for email and the like (TLD: com
; DNSSEC algorithm 13; authoritative DNS provider: Cloudflare).
Over the last 24 hours, the domain has received 15,605 queries. In response, it has dished out 15,601 NOERROR
response codes and a total of 4 NXDOMAIN
response codes.
How are NXDOMAIN responses still possible? What could be generating them?
Personally I cannot trigger one no matter what query I attempt, and my understanding is that DNSSEC should, at least in theory, eliminate this response code entirely.
Am I incorrect?
Best Answer
TL;DR
The lack of
NXDOMAIN
responses for Cloudflare hosted domains is a consequence of their specific DNSSEC implementation (using so called "black lies") and not a design of the DNSSEC protocol itself; hence observations will be different with other providers doing DNSSEC.Initial questions
Why wouldn't they be possible? DNSSEC or not, if you query for a name that doesn't exist, you get
NXDOMAIN
reply back.Why? And from where do you get that feeling?
Live example with a DNSSEC enabled domain
icann.org
is DNSSEC enabled right now. If I query for a name that does not exist under it, I get aNXDOMAIN
:DNSSEC is an extension of DNS in the sense that for a non validating resolver, answers are not different, even if the domain is DNSSEC enabled. So all return codes work in the same way.
Explanations about NSEC/NSEC3/RRSIG
What it does change, that you can see if adding
+dnssec
todig
(which doesn't mean "activate DNSSEC" but means "display DNSSEC related records - those areRRSIG
,NSEC
andNSEC3
- as they are normally not displayed), is that theAUTHORITY
section in case of theNXDOMAIN
gives further explanations withNSEC
orNSEC3
records:NSEC3
is more complicated (less human friendly) as it uses hashes of domain names. But what all the above means in summary is that the name I requested does not exists because it lands between two names that exist (but can't be seen immediately, because hashed), and that no wildcard exists (which is why you have threeNSEC3
records). TheRRSIG
records sign theNSEC3
ones, so all the above allows a resolving nameserver to indeed double check theNXDOMAIN
is legit and not introduced by some on-path attacker, because all theNSEC3
andRRSIG
records match the expectations.Simpler example with NSEC case
Let us take a domain DNSSEC enabled with
NSEC
instead ofNSEC3
: the root itself :-)If I do
dig @g.root-servers.net foobar. +dnssec
right now I getNXDOMAIN
, again for the same reasons as above and that TLD does not exist (yet?)But let us look in the results and especially one
NSEC
record:This is an affirmative signed (there is a corresponding
RRSIG
record) assertion from the nameserver telling me thatfoobar
does not exist in zone, because bothfoo
andfood
exists, but nothing in between. And per DNSSEC ordering rulesfoobar
would sort betweenfoo
andfood
and hence the above proves thatfoobar
does not exist. Incidentally it proves that a lots of other names do not exist, and some resolver could cache thisNSEC
and derives answer without requesting anything.Why? Because if I know that nothing exists between
foo
andfood
I immediately know thatfooa
doesn't exist, norfooa42
orfoobie
orfooccc
or similar…Back to CloudFlare specific case
CloudFlare implements "DNSSEC White Lies" AND "Black Lies", see https://www.cloudflare.com/dns/dnssec/dnssec-complexities-and-considerations/ and https://blog.cloudflare.com/black-lies/ for their own various reasons (in part because they do dynamic signatures generation, they generate the
RRSIG
records at the moment the request come, and not in advance; this is a compromise, both cases have advantages and drawbacks).What does that mean? They fake existence of ALL names, hence there is almost never an
NXDOMAIN
.Let us see one example:
(I removed the
RRSIG
records).So what does that tell? First:
NOERROR
and notNXDOMAIN
instead, so the resolver tells me the name I query for exists (but maybe not for the type I asked,A
which is defaultdig
type, and this is valid and known asNODATA
which meansNOERROR
but no content either, noANSWER
section, as it happens when the name exists, but not that type).The
AUTHORITY
part and specifically thatNSEC
record tells me that there are no names betweendwewgewfgewfee-32cewcewcew-2284.cloudflare.com.
(the name I asked for in fact, so not the previous one, just mine), and\000.dwewgewfgewfee-32cewcewcew-2284.cloudflare.com.
which may look like a strange name but 1) is totally valid (it is not a valid hostname because\000
means byte value 0 which has to be encoded as\000
for DNS operations, but still a valid domain names, as domain names in the DNS specifications can be any arbitrary bytes) and 2) is, with DNSSEC ordering algorithm, the name "right after" my name (so basically the range of the two names do not include any other name in between).The
RRSIG NSEC
part at the end of theNSEC
record means that there are no record typeA
on the name but there are record typesRRSIG
andNSEC
, which makes sense because I am exactly looking at theNSEC
record of that name, and as we are in DNSSEC land, of course there is anRRSIG
.So this is called a "lie" because the nameserver is replying to you: this name exists, but not this record type. And no matter which record type you ask for (except
NSEC
andRRSIG
) the nameserver will tell you: "this name does not exist for this record type". At the end, if it does not exist for any record type (besidesNSEC
andRRSIG
) it is really as if it (the name) does not exist at all, but it is just presented in a different way for reasons quickly detailed below.I recommend reading the second link but the gist of it explaining things is (I am skipping the whole points regarding
NSEC
/NSEC3
and wildcard records, with all the details on "closest encounter" and so on, but those are important if going deep onNSEC
stuff):(which is why they don't use
NSEC3
and keepNSEC
but then still need another solution to avoid walking the zone and hence enumerating all names)So that part above is the basic explanation of why wanting to avoid using
NXDOMAIN
and "emulating" it with success (NOERROR
) but at the same time responding negatively to any query (name+type for any type requested).The other point, again very specific to CloudFlare, is that it is difficult in their case to compute the "next" name (because
NSEC
is really giving a "range" of two names, as a link between two things existing), so instead of using the real next name as existing in their storage, they compute the mimimal "next" one following the DNSSEC algorithm, hence the strange name above with\000.
as prefix, a name that obviously don't exist either, so if you query for it you will get again the same kind of reply, but this time with anNSEC
record listing on right\001.
or\000.\000.
in fact, etc. and so on...Further down:
The goal reached with all that is smaller replies. And this is important in DNS land, because of various problems around fragmentation. From their example they go from 1096 bytes to just 357 bytes with black lies, cutting almost 2/3, quite an accomplishment!
All the above may become a "standard" in the future, for those wanting to do the same, as they wrote a document that can become maybe an IETF RFC one day: https://datatracker.ietf.org/doc/html/draft-valsorda-dnsop-black-lies
Do note it has consequences though:
NXDOMAIN
is an important signal: various other stuff is built on top of that, see RFC 8020 "NXDOMAIN: There Really Is Nothing Underneath" and RFC 8198 "Aggressive Use of DNSSEC-Validated Cache", so not having this signal anymore can have side effects (and it wouldn't be a good idea to change other recursive resolvers to try finding out if the authoritative side is using black lies and then consider them, that would be brittle; that point is exactly discussed in the draft above)NSEC
bitmap) were not computed correctly, hence breaking some stuff. Will try to update this if I do find back what I am thinking I have seen, but I could be delusional (easy to be with DNSSEC...); in fact I think it is related to the observation that all their initial examples did put far more types inNSEC
last section, where now they put onlyRRSIG
andNSEC
. See https://indico.dns-oarc.net/event/40/contributions/899/attachments/862/1563/nsec-bitmaps.pdf for live examples of errors inNSEC
bitmaps and their consequencesAh no in fact I remembered right, a bug in this
NSEC
bitmap is right at the source of a recent Slack outage :-), but it was not on Cloudflare fault, it was AWS Route53 where the problem was. See https://www.potaroo.net/ispcol/2021-12/oarc36.pdf for those details, but in short:So, in short, lying does have bad consequences some times :-) (and/or: DNSSEC is complicated, and wildcards in the DNS do create all sorts of complications too; in fact DNSSEC + wildcards + CNAME records are like 3 sure signs of apocalypse somehow...).
This is only ONE way to do things, the consequences (almost no NXDOMAIN responses) are absolutely not a consequence of the protocol (DNSSEC) but just of their implementation. So don't take this as granted at all, it will be different with other providers. But does it really change anything for you as owner of the zone or users of it? Not so much. Why were you so worried about
NXDOMAIN
responses :-) ?PS: