Linux – Python Script portion failing to execute within init.d/ on RedHat 6.5, why

amazon-web-servicesinit.dlinuxpythonshell

I've created a init.d script called rmCluster which is supposed to execute a simple python script at shutdown which uses boto to shutdown a particular cluster of servers, with 755 perms, located in /etc/init.d/rmCluster written as:

#!/bin/sh
#
# chkconfig: 0 1 1
# description: My service
#
# Author: Me
#
#
### BEGIN INIT INFO
# Provides: rmCluster
# Required-Start:
# Required-Stop:
# Default-Start:  0
# Default-Stop:  0
# Short-Description: My service
# Description: My service
### END INIT INFO

case $1 in
start)
python /usr/local/sbin/instanceStopper.py &
touch /tmp/theScriptWorks
;;
esac
exit 0

I have also created a symlink at /etc/rc0.d/S00rmCluster which points to the above. Note that I am touching a file in /tmp which is successfully occurring.

The python script also has 755 permissions and is written as:

#!/usr/bin/env python

import boto.ec2
import subprocess

conn=boto.ec2.connect_to_region("us-west-2")
reservations = conn.get_all_instances()
cluster = []
inst_id = subprocess.Popen(["wget", "-q", "-O", "-", "http://169.254.169.254/latest/meta-data/instance-id"], stdout=subprocess.PIPE).communicate()[0]

for res in reservations:
    for inst in res.instances:
        if inst_id in inst.tags["Name"] and "cloudformation" not in inst.tags:
            cluster.append( "%s" %(inst.id) )

conn.terminate_instances(cluster)

Note that the python script works perfectly fine when called directly and it also works fine when run the init.d script directly. I've also attempted to remove the shebang in the python script and specifying the path to python within init.d call and it still doesn't work.

My initial though is that perhaps the python libraries are not longer available during this runtime so the script fails, but I'm not sure how to check that. Also, I've contemplated that perhaps it needs to be placed somewhere else in the rc.x dirs. Currently I have set to at S00 and it is the only S00. Killall I moved to S01 and halt I moved to S02; these are the only three "S" scripts within rc.0/

I do appreciated the help

Solution

The solution was a combination of input from the response of @Jayan and @Kjetil Joergensen.

The final working version of the init.d script is as follows:

#!/bin/bash
#
# chkconfig: 2345 99 1
# description: My service
#
# Author: me
#
#
### BEGIN INIT INFO
# Provides: rmCluster
# Required-Start:
# Required-Stop:
# Default-Start:  0
# Default-Stop:  0
# Short-Description: My service
# Description: My service
### END INIT INFO


case "$1" in
start)
touch /var/lock/subsys/rmCluster
;;
stop)
/usr/bin/python /usr/local/sbin/instanceStopper.py
;;
esac
exit 0

The major changes were:

  1. Moving the 'start)' portion into a 'stop)' portion
  2. Touching the lock file in the 'start)' portion
  3. Modifying the 'chkconfig:' parameter so that it 'starts' with normal services and get's killed with them as well, thus preventing the script from trying to execute post 'networking' shutdown as noticed by @Kjetil Joergensen

Note: The python script was not changed.

Two caveats, one is that it requires to run service start rmCluster in order for it to be shutdown during runlevel 0 and 6. For me this was acceptable since it is getting set up during cloudformation provisioning so it is trivial to add this step into EC2 User Data. The second is that the script executes during reboots as well which may not be ideal for every use case. I'll have to do further investigation to see how to make such that only runlevel 0 actually runs 'stop' on this script.

Thank you both for the help.

Best Answer

(Almost) Everything you need to know is in /etc/rc.d/rc it's the shell-script that's used for changing runlevels, it's fairly readable in that it should be somewhat easy to suss out what it does.

The brief description of what it does is:

  • It first goes through every /etc/rc<runlevel>.d/K<num><subsystem> script, checks if it's started by looking for /var/lock/subsys/ and runs stop if it is
  • It then goes through every /etc/rc<runlevel>.d/S<num><subsystem> script, checks if it's stopped by checking for /var/lock/subsys/<subsystem> and runs start on it.

(There's probably some convenience function around dealing with /var/lock/subsys)

If everything before this holds true, what you'll want to do is probably:

  • Ensure there's a /var/lock/subsys/<yourscriptname> present
  • Runlevel 0 seems appropriate (unless you also want to include reboot, which is 6), and you'll want to run it as /etc/rc0.d/K<num < 90><yourscriptname>, networking is killed off at 90, so change your implementation off to stop rather than start. You could potentially also "start" your script as part of the relevant runlevels (3,5, 1 being single user no network and 2,4 being unused) by just leaving behind the appropriate stuff in /var/lock/subsys
  • You definitively want to get rid of the ampersand, as your initscript will return before it's done, depending on how fast it chews through the rest of the scripts it'll get to 90 and kill off networking, at some point later it'll get to killall and eventually halt. To avoid shutdown hanging indefinately, you'll want to do the appropriate error-handling / timeout-handling in your script rather than just fire it off and leave the rest up to chance.