Linux – Can’t start Hadoop from an init.d script

bootcentoshadoopinit.dlinux

I'm using CentOS 6.2. I'm trying to start Hadoop from an init.d script, but it's failing. This is what I see in boot.log :

Retrigger failed udev events                               [  OK  ]
Enabling Bluetooth devices:
starting namenode, logging to /home/hadoop/hadoop/hadoop-0.20.2/bin/../logs/hadoop--namenode-localhost.localdomain.out
localhost: ssh: connect to host localhost port 22: Connection refused
localhost: ssh: connect to host localhost port 22: Connection refused
starting jobtracker, logging to /home/hadoop/hadoop/hadoop-0.20.2/bin/../logs/hadoop--jobtracker-localhost.localdomain.out
localhost: ssh: connect to host localhost port 22: Connection refused
Starting sshd:                                             [  OK  ]

Here's my init.d script :

### BEGIN INIT INFO
# Provides:          hadoop
# Required-Start:    sshd
# Required-Stop:     
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: hadoop
# Description:       start hadoop daemons
### END INIT INFO

# source function library
. /etc/rc.d/init.d/functions

RETVAL=0

case "$1" in
    start)
        /home/hadoop/hadoop/hadoop-0.20.2/bin/start-all.sh
        RETVAL=$?
        ;;
    stop)
        /home/hadoop/hadoop/hadoop-0.20.2/bin/stop-all.sh
        RETVAL=$?
        ;;
    *)
        echo "Ya blew it"
        RETVAL=2
esac

exit $RETVAL

when I type in chkconfig –list hadoop from the command line, I get this :

hadoop          0:off   1:off   2:on    3:on    4:on    5:on    6:off

I created a user called hadoop, and all my hadoop stuff lives in /home/hadoop/hadoop/. I have the UID bit set on all the scripts in /home/hadoop/hadoop/hadoop-0.20.2/bin/, so start-all.sh and stop-all.sh should run as the hadoop user.

From the command line, I can successfully execute start-all.sh, stop-all.sh, and /init.d/hadoop. I can execute the scripts as the hadoop user or root, and they work fine. However, when /init.d/hadoop is called during the boot process, it fails.

Any idea what I'm doing wrong?

Thanks for the help!

Best Answer

The errors seem pretty obvious...it appears that the hadoop startup scripts use ssh to connect (possibly as a different user) and start things up:

localhost: ssh: connect to host localhost port 22: Connection refused

And if you look at the startup, you'll see that sshd is starting after hadoop:

starting namenode, logging to /home/hadoop/hadoop/hadoop-0.20.2/bin/../logs/hadoop--namenode-localhost.localdomain.out
starting jobtracker, logging to /home/hadoop/hadoop/hadoop-0.20.2/bin/../logs/hadoop--jobtracker- 
Starting sshd:                                             [  OK  ]

The solution is to make sure that sshd starts first (although frankly using ssh to localhost to start the service seems like a bad idea). You can change the startup order of things by looking in the appropriate runlevel directory (e.g., /etc/rc.d/rc3.d) and changing the number after the S (as in S55sshd). Make sure that your shutdown order is correct, too (that is, make sure hadoop is configured to stop before sshd).