Hadoop: Blacklisted tasktracker

hadoop

I am running a Hadoop job (using Hadoop 0.20.2) on a 6 machine setup; one machine is the namenode / secondary node / job tracker (master) and the other 5 machines are all datanodes / tasktrackers (slaves). The job has over 14,000 maps and it is more than 10% complete. When I browse the job tracker Job details page I see this:

Status: Running
Started at: Tue Jul 05 18:12:44 PDT 2011
Running for: 66hrs, 5mins, 4sec
Job Cleanup: Pending
Black-listed TaskTrackers: 1

I log in to the machine in question and I can see that the task tracker process is running, but the machine is not doing any work (the top command shows me that CPU utilization is < 10%). I have already restarted the task tracker node with these commands

./hadoop-daemon.sh  stop tasktracker
./hadoop-daemon.sh  start tasktracker

but the node is still in the blacklist, and task tracker is running, but the machine is still not performing any work.

Question: Is there any way to tell Hadoop to "un" blacklist it and send tasks to the node? Hopefully without having to restart the job?

PS. The node was confirmed to be running and performing tasks at the start of the job.

Best Answer

Put following config in conf/hdfs-site.xml:

<property>
  <name>dfs.hosts</name>
  <value>/full/path/to/whitelisted/node/file</value>
</property>

Use following command to ask Hadoop to refresh node status to based on configuration.

./bin/hadoop dfsadmin -refreshNodes
Related Topic