For analytics purposes, I'm looking at large sets of IP addresses in server log files. I'm trying to perform reverse-DNS lookups to understand where traffic is coming from – e.g. what percentage of IPs resolve to corporations, schools, government, international etc.
Despite a bunch of optimizations, individually reverse-DNS'ing every IP address still appears to be fairly expensive though. So –
is there any way to obtain an entire range of IPs from a reverse-DNS?
If yes, this could greatly reduce the number of actual reverse-DNS lookups.
Example (numbers slightly obfuscated):
- Log file contains a request from an IP
128.151.162.17
- Reverse DNS resolves to
11.142.152.128.in-addr.arpa 21599 IN PTR alamo.ceas.rochester.edu
- (So this is a visitor from Rochester University, rochester.edu)
- Now, would it be safe to assume that all at least all IPs from
128.151.162.*
will also resolve to rochester.edu? - What about
128.151.*.*
? Is there a way to get the exact IP range?
Best Answer
Not really, no; in extremely rare cases you might be able to do a DNS zone transfer query to get all the records in the zone (the whole /24, generally), but there's a very low chance that the name server you're querying will respond to this request. Expect one query per address for reverse DNS (sorry!).
Generally speaking, probably, as a university they're likely to own the whole /24. However, that's not a good rule to apply as a general case; a smaller school might not have a whole /24, or might not have it in reverse DNS.
The reverse DNS itself is going to be pretty hit-or-miss - in many cases it'll be just generated names under the ISP's hostnames or no records at all. For better data, we're going to make things even more expensive - you should also look at data from whois.
For example, here's the info from that Rochester IP - it shows the size of the allocation (the whole /16 range, so in this case that applies to
128.151.*.*
) and the organization it's allocated to.The whois info should provide a great source of truth for the info you want, and has the upside of being able to see what range that applies to. The downside is that for smaller allocations, a range will often just show as belonging to the ISP instead of the end customer. Combining both whois and reverse DNS should provide the best information (and be ridiculously slow).