Public Facing Recursive DNS Servers – iptables rules

ddosdomain-name-system

We run public-facing recursive DNS servers on Linux machines. We've been used for DNS amplification attacks. Are there any recommended iptables rules that would help mitigate these attacks?

The obvious solution is just to limit outbound DNS packets to a certain traffic level. But I was hoping to find something a little bit more clever so that an attack just blocks off traffic to the victim IP address.

I've searched for advice and suggestions, but they all seem to be "don't run public-facing recursive name servers". Unfortunately, we are backed into a situation where things that are not easy to change will break if we don't do so, and this is due to decisions made more than a decade ago before these attacks were an issue.

Best Answer

The whole thing kind of reeks of a "not my problem" scenario that's not really your fault and should/could be 100% resolved by taking the appropriate action, regardless of how "difficult" or "hard" it is, and that's terminating your open recursive server.

Phase it out: tell the customers that this server is going away as of X date. After that time, they need to install a patch (assuming you have one) to stop it from using your DNS server. This is done all the time. Sysadmins, network admins, helpdesk guys, programmers? We get it; this end-of-life thing happens all the time, because its standard operating procedure for a vendor/service provider/partner to tell us to stop using something after X date. We don't always like it, but its a fact of life in IT.

You say you don't have this issue on the current devices, so I'm assuming you've resolved this issue with a firmware update or patch. I know you said you can't touch the device, but surely they can? I mean, if they're allowing these boxes to essentially phone home to you, they can't really be that anal about who's doing what to their devices; you could have a reverse proxy setup for all they know, so why not have them install a patch that fixes this or tell them to use their own DNS servers. Surely your device supports DHCP; I can't think of a network device (not matter how old/frail/odd) that doesn't.

If you can't do that, the next thing to do is control who can access your recursive server: you say that it's "hard to tell" who's using it and how, but it's time to find out for certain and start dropping traffic that's not legitimate.

These are "quasi-military/government" organizations, right? Well, they likely are part of a legitimate netblock that they own; these devices aren't home routers behind dynamic IPs. Find out. Contact them, explain the problem and how you are saving them a lot of money by not forcing a firmware or product replacement if only they can confirm the netblock/IP address that the device will be using to access your DNS server.

This is done all the time: I have several customers who restrict extranet access or HL7 listeners to healthcare partners in this way; it's not that hard to get them to fill out a form and provide the IP and/or netblock I should be expecting traffic from: if they want access to the extranet, they have to give me an IP or subnet. And this is rarely a moving target so it's not like you're going to get inundated with hundreds of IP change requests every day: big campus hospital networks that own their own netblocks with hundreds of subnets and thousands and thousands of host IPs routinely give me a handful of IP addresses or a subnet I should be expecting; again, these aren't laptop users wandering all around campus all the time, so why would I expect to see UDP source packets from an ever-changing IP address? Clearly I'm making I'm an assumption here, but I'll bet it's not as much as you think for < 100s of devices. Yes, it'll be a lengthy ACL, and yes, it requires some maintenance and communication (gasp!) but its the next best thing outside of shutting it down completely.

If for some reason the channels of communication are not open (or somebody's too afraid or can't be bothered to contact these legacy device owners and do this properly), you need to to establish a baseline of normal usage/activity so you can formulate some other strategy that will help (but not prevent) your participation in DNS amplification attacks.

A long-running tcpdump should work filtering on incoming UDP 53 and verbose logging on the DNS server application. I would also want to start collecting source IP addresses/netblocks/geoIP information (are all your clients in the US? Block everything else) because, as you say, you're not adding any new devices, you're merely providing a legacy service to existing installations.

This will also help you understand what record types are being requested, and for what domains, by whom, and how often: for DNS amplification to work as intended, the attacker needs to be able to request a large record type (1) to a functioning domain (2).

"large record type": do your devices even need TXT or SOA records to be able to be resolved by your recursive DNS server? You may be able to specify which record types are valid on your DNS server; I believe it's possible with BIND and perhaps Windows DNS, but you'd have to do some digging. If your DNS server responds with SERVFAIL to any TXT or SOA records, and least that response is an order of magnitude (or two) smaller than the payload that was intended. Obviously you're still "part of the problem" because the spoofed victim would still be getting those SERVFAIL responses from your server, but at least you're not hammering them and perhaps your DNS server gets "delisted" from the harvested list(s) the bots use over time for not "cooperating".
"functioning domain": you may be able to whitelist only domains that are valid. I do this on my hardened data center setups where the server(s) only need Windows Update, Symantec, etc. to function. However, you're just mitigating the damage you're causing at this point: the victim would still get bombarded with NXDOMAIN or SERVFAIL responses from your server because your server would still respond to the forged source IP. Again, Bot script might also automatically update it's open server list based on results, so this could get your server removed.

I'd also use some form of rate limiting, as others have suggested, either at the application level (i.e. message size, requests per client limitations) or the firewall level (see the other answers), but again, you're going to have to do some analysis to ensure you're not killing legitimate traffic.

An Intrusion Detection System that's been tuned and/or trained (again, need a baseline here) should be able to detect abnormal traffic over time by source or volume as well, but would likely take regular babysitting/tuning/monitoring to prevent false positives and/or see if it's actually preventing attacks.

At the end of the day, you have to wonder if all this effort is worth it or if you should just insist that the right thing is done and that's eliminating the problem in the first place.

Related Solutions

Linux – iptables: building a rule set against abuse for DNS amplification attacks

Question 1:

The string does not match because the the "." is not included in the packet. A DNS packet does not contain a "hostname" as such but "labels". In the packet, every part of the domain name is a label, prefixed by the number of bytes for the label.

So "isc.org" translates to:

isc: 03 69 73 63
org: 03 6f 72 67

Or in the packet:

03697363036f7267

Every label is limited to 63 bytes, the whole name is limited to 255 bytes.

It's explained in the DNS RFC:

https://www.rfc-editor.org/rfc/rfc1035#section-2.3.4

https://www.rfc-editor.org/rfc/rfc1035#section-4.1.2

Question 2:

You need to enable the net.netfilter.nf_conntrack_acct flag to use the conntrack option (see iptables manpage). But I don't think it's wise to use it like that. There will always be legitimate answers that are large packets.

Perhaps you're better off using the hashlimit extension. It was already mentioned:

https://lists.dns-oarc.net/pipermail/dns-operations/2012-October/009321.html

Intermittent recursive/iterative DNS query failure

The packet capture isn't revealing anything that your dig queries did not. Reply code: No such name (3) is a longwinded way of saying NXDOMAIN (RCODE 3), the latter of which is more meaningful to DNS administrators. I will not remove the packet capture from your post, but it will be less of a wall of text for others to sift through if you find yourself agreeing with me on this point.

A response of NXDOMAIN is problematic; it is an indication of a successful lookup by the your ISP's recursive nameservers. It's bad behavior from your perspective because the record is missing, but the way in which it failed tells a different story. Your ISP's servers are saying: "I talked to the authoritative nameservers, received a successful reply, and they told me the record didn't exist". This is quite different than SERVFAIL, which would indicate an actual communication problem.

The different responses between queries are most likely due to load balancing: there are multiple servers behind the IP address that you are querying. One of them has "negatively cached" the lookup failure and will not attempt the lookup again until the ncache interval for that domain expires. Another of their servers succeeded, and "positively cached" it, causing it to remember that answer for the duration of the TTL. (3532 means 68 seconds have elapsed since that event, 3532+68 = 3600)

Conclusion

Due to the distributed nature of AWS, it will be extremely difficult for any of us to give you advice beyond this. I queried the four nameserver addresses that were served to me and found no problems.

If you see this issue again, you can try querying the A record directly to see if anything stands out:

dig www.alumninews.uottawa.ca @64.59.184.1
(+recurse is set by default and not necessary)

Your best bet is to ask your ISP to investigate further the next time it happens, but be prepared for a response of "our server is doing what it was told to do and we can't help you".

Best Answer

Related Solutions

Linux – iptables: building a rule set against abuse for DNS amplification attacks

Intermittent recursive/iterative DNS query failure

Conclusion

Related Topic