I'm using CentOS 6.2. I'm trying to start Hadoop from an init.d script, but it's failing. This is what I see in boot.log :
Retrigger failed udev events [ OK ]
Enabling Bluetooth devices:
starting namenode, logging to /home/hadoop/hadoop/hadoop-0.20.2/bin/../logs/hadoop--namenode-localhost.localdomain.out
localhost: ssh: connect to host localhost port 22: Connection refused
localhost: ssh: connect to host localhost port 22: Connection refused
starting jobtracker, logging to /home/hadoop/hadoop/hadoop-0.20.2/bin/../logs/hadoop--jobtracker-localhost.localdomain.out
localhost: ssh: connect to host localhost port 22: Connection refused
Starting sshd: [ OK ]
Here's my init.d script :
### BEGIN INIT INFO
# Provides: hadoop
# Required-Start: sshd
# Required-Stop:
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: hadoop
# Description: start hadoop daemons
### END INIT INFO
# source function library
. /etc/rc.d/init.d/functions
RETVAL=0
case "$1" in
start)
/home/hadoop/hadoop/hadoop-0.20.2/bin/start-all.sh
RETVAL=$?
;;
stop)
/home/hadoop/hadoop/hadoop-0.20.2/bin/stop-all.sh
RETVAL=$?
;;
*)
echo "Ya blew it"
RETVAL=2
esac
exit $RETVAL
when I type in chkconfig –list hadoop from the command line, I get this :
hadoop 0:off 1:off 2:on 3:on 4:on 5:on 6:off
I created a user called hadoop, and all my hadoop stuff lives in /home/hadoop/hadoop/. I have the UID bit set on all the scripts in /home/hadoop/hadoop/hadoop-0.20.2/bin/, so start-all.sh and stop-all.sh should run as the hadoop user.
From the command line, I can successfully execute start-all.sh, stop-all.sh, and /init.d/hadoop. I can execute the scripts as the hadoop user or root, and they work fine. However, when /init.d/hadoop is called during the boot process, it fails.
Any idea what I'm doing wrong?
Thanks for the help!
Best Answer
The errors seem pretty obvious...it appears that the
hadoop
startup scripts usessh
to connect (possibly as a different user) and start things up:And if you look at the startup, you'll see that
sshd
is starting afterhadoop
:The solution is to make sure that
sshd
starts first (although frankly using ssh to localhost to start the service seems like a bad idea). You can change the startup order of things by looking in the appropriate runlevel directory (e.g.,/etc/rc.d/rc3.d
) and changing the number after theS
(as inS55sshd
). Make sure that your shutdown order is correct, too (that is, make surehadoop
is configured to stop beforesshd
).