DNS SERVFAIL and Incorrect Flag only via TCP: Broken DNS Servers

domain-name-systemroottcpudpzones

Is it poor configuration to return the root name servers in the additional section for a CNAME lookup that points to another domain? Particularly the one I'm seeing this with is a CNAME hosted by Network Solutions with the CNAME pointing to a different domain & TLD.

I ask if this is poor configuration because all these additional records result in exceeding the size of the UDP packet forcing the query to be re-done with TCP.

dig www.unitedstatesartists.org +trace

A name server response:

example.org. 86400  IN      NS      ns15.worldnic.com.
example.org. 86400  IN      NS      ns16.worldnic.com.
;; Received 95 bytes from 199.249.120.1#53(b2.org.afilias-nst.org) in 79 ms

;; Warning: Message parser reports malformed message packet.
;; Truncated, retrying in TCP mode.
www.example.org. 7200 IN    CNAME   load-01-123.us-west-1.elb.amazonaws.com.
.  518400  IN      NS      a.root-servers.net.
.  518400  IN      NS      b.root-servers.net.
.  518400  IN      NS      c.root-servers.net.
.  518400  IN      NS      d.root-servers.net.
.  518400  IN      NS      e.root-servers.net.
.  518400  IN      NS      f.root-servers.net.
.  518400  IN      NS      g.root-servers.net.
.  518400  IN      NS      h.root-servers.net.
.  518400  IN      NS      i.root-servers.net.
.  518400  IN      NS      j.root-servers.net.
.  518400  IN      NS      k.root-servers.net.
.  518400  IN      NS      l.root-servers.net.
.  518400  IN      NS      m.root-servers.net.
;; Received 526 bytes from 205.178.190.8#53(ns15.worldnic.com) in 173 ms

Returning the additional records or not is random. Sometimes when they don't return the additional there's still a truncated response and dig retries in TCP.

example.org. 86400  IN      NS      ns15.worldnic.com.
example.org. 86400  IN      NS      ns16.worldnic.com.
;; Received 95 bytes from 199.19.56.1#53(a0.org.afilias-nst.info) in 82 ms

;; Warning: Message parser reports malformed message packet.
;; Truncated, retrying in TCP mode.
www.example.org. 7200 IN    CNAME   load-01-123.us-west-1.elb.amazonaws.com.
;; Received 107 bytes from 205.178.190.8#53(ns15.worldnic.com) in 164 ms

Update 2010-12-08

With more testing found:

  • Network Solutions responds with a SERVFAIL (server failure) with a recursive query (dig's default if not +trace) yet still gives the correct answer.
  • Setting dig's +norecurse works fine but not always. Sometimes a SERVFAIL is returned – Not good. Details of possibly why follows below
  • Network Solutions' inclusion of the root servers in the authoritative and additional section causes the UDP truncation and requires TCP to complete.

Overview of the following capture:

  • Non-recursive request record from ns15
  • ns15 answer includes root servers in auth and additional and marks reply as truncated
  • Non-recursive request is retried in TCP due to truncated UDP
  • Similar answer from ns15 using TCP except "recursion desired" is incorrectly set and "server failure" code is also set

We've already created a ticket with them but we'll see if it goes anywhere. Follows is the DNS packets from tshark details earlier:

First question (via UDP):

Domain Name System (query)
    Transaction ID: 0x27ef
    Flags: 0x0000 (Standard query)
        0... .... .... .... = Response: Message is a query
        .000 0... .... .... = Opcode: Standard query (0)
        .... ..0. .... .... = Truncated: Message is not truncated
        .... ...0 .... .... = Recursion desired: Don't do query recursively
        .... .... .0.. .... = Z: reserved (0)
        .... .... ...0 .... = Non-authenticated data OK: Non-authenticated data is unacceptable

First answer (via UDP):

Domain Name System (response)
    [Request In: 1]
    [Time: 0.078623000 seconds]
    Transaction ID: 0x27ef
    Flags: 0x8600 (Standard query response, No error)
        1... .... .... .... = Response: Message is a response
        .000 0... .... .... = Opcode: Standard query (0)
        .... .1.. .... .... = Authoritative: Server is an authority for domain
        .... ..1. .... .... = Truncated: Message is truncated
        .... ...0 .... .... = Recursion desired: Don't do query recursively
        .... .... 0... .... = Recursion available: Server can't do recursive queries
        .... .... .0.. .... = Z: reserved (0)
        .... .... ..0. .... = Answer authenticated: Answer/authority portion was not authenticated by the server
        .... .... .... 0000 = Reply code: No error (0)

Second question (via TCP):

Domain Name System (query)
    Length: 56
    Transaction ID: 0xbc37
    Flags: 0x0000 (Standard query)
        0... .... .... .... = Response: Message is a query
        .000 0... .... .... = Opcode: Standard query (0)
        .... ..0. .... .... = Truncated: Message is not truncated
        .... ...0 .... .... = Recursion desired: Don't do query recursively
        .... .... .0.. .... = Z: reserved (0)
        .... .... ...0 .... = Non-authenticated data OK: Non-authenticated data is unacceptable

Second answer (via TCP, notice "recursion desire"):

Domain Name System (response)
    [Request In: 6]
    [Time: 0.147357000 seconds]
    Length: 107
    Transaction ID: 0xbc37
    Flags: 0x8102 (Standard query response, Server failure)
        1... .... .... .... = Response: Message is a response
        .000 0... .... .... = Opcode: Standard query (0)
        .... .0.. .... .... = Authoritative: Server is not an authority for domain
        .... ..0. .... .... = Truncated: Message is not truncated
        .... ...1 .... .... = Recursion desired: Do query recursively
        .... .... 0... .... = Recursion available: Server can't do recursive queries
        .... .... .0.. .... = Z: reserved (0)
        .... .... ..0. .... = Answer authenticated: Answer/authority portion was not authenticated by the server
        .... .... .... 0010 = Reply code: Server failure (2)

Best Answer

Yes, it's poor configuration and/or implementation - there's no reason for an authoritative server to return root referrals in an otherwise valid response.

Furthermore, I'm seeing other errors that simply shouldn't happen from those two Worldnic servers:

  • sometimes it gives the right answer, but with a SERVFAIL error code and without the AA bit set.

  • UDP replies are always truncated at 512 bytes, even with EDNS0 (RFC 2671) specified. This means that DNSSEC won't work with this name server

  • It's not just the ADDITIONAL section that's a problem, it's putting the root name servers in the AUTHORITY section of an authoritative (AA bit set) answe.