Public Facing Recursive DNS Servers – iptables rules

ddosdomain-name-system

We run public-facing recursive DNS servers on Linux machines. We've been used for DNS amplification attacks. Are there any recommended iptables rules that would help mitigate these attacks?

The obvious solution is just to limit outbound DNS packets to a certain traffic level. But I was hoping to find something a little bit more clever so that an attack just blocks off traffic to the victim IP address.

I've searched for advice and suggestions, but they all seem to be "don't run public-facing recursive name servers". Unfortunately, we are backed into a situation where things that are not easy to change will break if we don't do so, and this is due to decisions made more than a decade ago before these attacks were an issue.

Best Answer

The whole thing kind of reeks of a "not my problem" scenario that's not really your fault and should/could be 100% resolved by taking the appropriate action, regardless of how "difficult" or "hard" it is, and that's terminating your open recursive server.

Phase it out: tell the customers that this server is going away as of X date. After that time, they need to install a patch (assuming you have one) to stop it from using your DNS server. This is done all the time. Sysadmins, network admins, helpdesk guys, programmers? We get it; this end-of-life thing happens all the time, because its standard operating procedure for a vendor/service provider/partner to tell us to stop using something after X date. We don't always like it, but its a fact of life in IT.

You say you don't have this issue on the current devices, so I'm assuming you've resolved this issue with a firmware update or patch. I know you said you can't touch the device, but surely they can? I mean, if they're allowing these boxes to essentially phone home to you, they can't really be that anal about who's doing what to their devices; you could have a reverse proxy setup for all they know, so why not have them install a patch that fixes this or tell them to use their own DNS servers. Surely your device supports DHCP; I can't think of a network device (not matter how old/frail/odd) that doesn't.

If you can't do that, the next thing to do is control who can access your recursive server: you say that it's "hard to tell" who's using it and how, but it's time to find out for certain and start dropping traffic that's not legitimate.

These are "quasi-military/government" organizations, right? Well, they likely are part of a legitimate netblock that they own; these devices aren't home routers behind dynamic IPs. Find out. Contact them, explain the problem and how you are saving them a lot of money by not forcing a firmware or product replacement if only they can confirm the netblock/IP address that the device will be using to access your DNS server.

This is done all the time: I have several customers who restrict extranet access or HL7 listeners to healthcare partners in this way; it's not that hard to get them to fill out a form and provide the IP and/or netblock I should be expecting traffic from: if they want access to the extranet, they have to give me an IP or subnet. And this is rarely a moving target so it's not like you're going to get inundated with hundreds of IP change requests every day: big campus hospital networks that own their own netblocks with hundreds of subnets and thousands and thousands of host IPs routinely give me a handful of IP addresses or a subnet I should be expecting; again, these aren't laptop users wandering all around campus all the time, so why would I expect to see UDP source packets from an ever-changing IP address? Clearly I'm making I'm an assumption here, but I'll bet it's not as much as you think for < 100s of devices. Yes, it'll be a lengthy ACL, and yes, it requires some maintenance and communication (gasp!) but its the next best thing outside of shutting it down completely.

If for some reason the channels of communication are not open (or somebody's too afraid or can't be bothered to contact these legacy device owners and do this properly), you need to to establish a baseline of normal usage/activity so you can formulate some other strategy that will help (but not prevent) your participation in DNS amplification attacks.

A long-running tcpdump should work filtering on incoming UDP 53 and verbose logging on the DNS server application. I would also want to start collecting source IP addresses/netblocks/geoIP information (are all your clients in the US? Block everything else) because, as you say, you're not adding any new devices, you're merely providing a legacy service to existing installations.

This will also help you understand what record types are being requested, and for what domains, by whom, and how often: for DNS amplification to work as intended, the attacker needs to be able to request a large record type (1) to a functioning domain (2).

  1. "large record type": do your devices even need TXT or SOA records to be able to be resolved by your recursive DNS server? You may be able to specify which record types are valid on your DNS server; I believe it's possible with BIND and perhaps Windows DNS, but you'd have to do some digging. If your DNS server responds with SERVFAIL to any TXT or SOA records, and least that response is an order of magnitude (or two) smaller than the payload that was intended. Obviously you're still "part of the problem" because the spoofed victim would still be getting those SERVFAIL responses from your server, but at least you're not hammering them and perhaps your DNS server gets "delisted" from the harvested list(s) the bots use over time for not "cooperating".

  2. "functioning domain": you may be able to whitelist only domains that are valid. I do this on my hardened data center setups where the server(s) only need Windows Update, Symantec, etc. to function. However, you're just mitigating the damage you're causing at this point: the victim would still get bombarded with NXDOMAIN or SERVFAIL responses from your server because your server would still respond to the forged source IP. Again, Bot script might also automatically update it's open server list based on results, so this could get your server removed.

I'd also use some form of rate limiting, as others have suggested, either at the application level (i.e. message size, requests per client limitations) or the firewall level (see the other answers), but again, you're going to have to do some analysis to ensure you're not killing legitimate traffic.

An Intrusion Detection System that's been tuned and/or trained (again, need a baseline here) should be able to detect abnormal traffic over time by source or volume as well, but would likely take regular babysitting/tuning/monitoring to prevent false positives and/or see if it's actually preventing attacks.

At the end of the day, you have to wonder if all this effort is worth it or if you should just insist that the right thing is done and that's eliminating the problem in the first place.