Daemon crashes but upstart thinks it is still alive


I have to following problem: We have an Java application which is started by an bash script.
This application should run as a daemon, so we have an upstart-job to start it.

start on runlevel [2345]                    
stop on runlevel [!2345]                    

#tell upstart we will fork later, so it will mangage the pids. 
 expect fork

#If the daemon stoppes unexpectedly, restart it! 
  #The framework will only work, if we start it from this directory.
  cd /usr/lib/app-dir
  nohup ./appStartScript.sh &> /dev/null &

  #send an upstart event, in case we will chain this job later
  emit app_running                         
end script

Sometimes, the application stops working. There is neither a .hprof file, nor a hserr file which is usally created if the VM crashes.
Upstart reporst the application as running,

appDeamon start/running, process 1131

But the PID is not listed in ps -aux. (Also, upstart is not able to stop the process with stop appDeamon.)

I'd like to know:
a) Why doesn't upstart recognize that the application has crashed?
b) Is there a possibility to force upstart to restart the application, even if the process with the given pid is not longer present? (Up to now, we need to restart the whole server.)

Our system is Ubuntu Linux 10.04.1 LTS.

Best Answer

This is what usually happens with daemon programs:

  1. Upstart runs the executable in the foreground
  2. The program loads it's configuration file, checks it, performs various setup operations (like opening a listening port).
  3. If the previous step fails, the program exits and upstart gets a non-zero exit code, thereby knowing that it failed
  4. If step 2 didn't fail, the program now forks, essentially creating two copies of it
  5. The process which Upstart initially executed now exits with a zero exit code, indicating that it was successful
  6. The forked process continues running and does the actual work of the application

The problem is that Java doesn't provide a mechanism to fork, and so this tried and tested pattern cannot be implemented properly. When executing Java daemons you are forced to background the process immediately (i.e. the & symbol in the script). Upstart essentially starts the process and then immediately forgets about it -- the process has no way of indicating to Upstart whether it successfully started up or not.

The only way around this is to start the process, background it, and then check whether it's still running in order to determine whether it was successful or not. The catch of course is determining when to check whether it's still running. The simple solution is something like this:

java MyClass >/dev/null 2>&1 &
sleep 3
if kill -0 $PID; then
    exit 0
    exit 1

There are more elaborate schemes to determine when to check the process, like making the program close stdout and stderr or create its PID file when it has finished its startup routine, and waiting for these events in the startup script.

The simplest solution for you is to modify your Upstart script to something like this:

    cd /usr/lib/app-dir
    nohup ./appStartScript.sh &> /dev/null &
    sleep 3
    if kill -0 $PID; then
        emit app_running                         
        exit 0
        exit 1
end script
Related Topic