Ever since we switched over our cluster to communicate via private interfaces and created a DNS server with correct forward and reverse lookup zones, we get this message before the M/R job runs:
ERROR org.apache.hadoop.hbase.mapreduce.TableInputFormatBase - Cannot resolve the host name for /192.168.3.9 because of javax.naming.NameNotFoundException: DNS name not found [response code 3]; remaining name '9.3.168.192.in-addr.arpa'
A dig and nslookup both show that the reverse and forward look-ups both get good responses with no errors from within the cluster.
Shortly after these messages, the job runs…but every once in awhile we get a NPE:
Exception in thread "main" java.lang.NullPointerException
INFO app.insights.search.SearchIndexUpdater - at org.apache.hadoop.net.DNS.reverseDns(DNS.java:93)
INFO app.insights.search.SearchIndexUpdater - at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.reverseDNS(TableInputFormatBase.java:219)
INFO app.insights.search.SearchIndexUpdater - at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:184)
INFO app.insights.search.SearchIndexUpdater - at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1063)
INFO app.insights.search.SearchIndexUpdater - at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1080)
INFO app.insights.search.SearchIndexUpdater - at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
INFO app.insights.search.SearchIndexUpdater - at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:992)
INFO app.insights.search.SearchIndexUpdater - at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:945)
INFO app.insights.search.SearchIndexUpdater - at java.security.AccessController.doPrivileged(Native Method)
INFO app.insights.search.SearchIndexUpdater - at javax.security.auth.Subject.doAs(Subject.java:415)
INFO app.insights.search.SearchIndexUpdater - at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
INFO app.insights.search.SearchIndexUpdater - at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945)
INFO app.insights.search.SearchIndexUpdater - at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
INFO app.insights.search.SearchIndexUpdater - at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596)
INFO app.insights.search.SearchIndexUpdater - at app.insights.search.correlator.comments.CommentCorrelator.main(CommentCorrelator.java:72
Does anyone else who has set-up a CDH Hadoop cluster on a private network w/DNS server get this?
CDH 4.3.1 with MR1 2.0.0 and HBase 0.94.6
Best Answer
It's likely that your internal DNS servers are not responding quickly enough to the number of requests that can come in from your Hadoop environment (depending on it's size).
You can do one of several things:
Setting up a caching-only nameserver is pretty trivial. You should be able to find a tutorial appropriate for doing it appropriate to your OS with a little searching.
Setting up nscd is also pretty trivial, with the caveat that it can sometimes cause wonky things to occur (such as host name changes taking longer than you expect). If a sufficiently short enough cache time, this hasn't been an issue for us. I would recommend disabling the passwd and group caching that nscd can enable. The cache time doesn't need to be very long. 600 seconds seems like a good balance for our cluster and reduces the actual DNS lookups pretty significantly. Even 60 seconds would be better than hitting the DNS server repeatedly.
My config file looks like this:
Finally, going the /etc/hosts route: I would not recommend this if you have a large cluster. It's just too administratively expensive to make sure all your configurations are up to date.