The best method is to start the process in a terminal multiplexer. Alternatively you can make the process not receive the HUP signal.
A terminal multiplexer provides "virtual" terminals which run independent from the "real" terminal (actually all terminals today are "virtual" but that is another topic for another day). The virtual terminal will keep running even if your real terminal is closed with your ssh session.
All processes started from the virtual terminal will keep running with that virtual terminal. When you reconnect to the server you can reconnect to the virtual terminal and everything will be as if nothing happened, other than the time which passed.
Two popular terminal multiplexers are screen and tmux.
Screen has a steep learning curve. Here is a good tutorial with diagrams explaining the concept: http://www.ibm.com/developerworks/aix/library/au-gnu_screen/
The HUP signal (or SIGHUP) is sent by the terminal to all its child processes when the terminal is closed. The common action upon receiving SIGHUP is to terminate. Thus when your ssh session gets disconnected all your processes will terminate. To avoid this you can make your processes not receive SIGHUP.
Two easy methods to do so are nohup
and disown
.
For more information about how nohup
and disown
works read this question and answer: https://unix.stackexchange.com/questions/3886/difference-between-nohup-disown-and
Note: although the processes will keep running you can no longer interact with them because they are no longer attached to any terminal. This method is mainly useful for long running batch processes which, once started, no longer need any user input.
Best Answer
AFAICS this is neither related to free RAM nor SWAP. We have the same problem here which sometimes hits production machines and there is plenty of RAM free, quite often more than 700 MB with no dirty buffers to sync and 0 bytes SWAP used. It definitively looks like a severe Kernel BUG due to some unknown race condition.
Currently we run CentOS Kernel 2.6.18-194.el5 and will try to replace it by some newer kernel, because we think, this might help.
In the meanwhile there is a script, which is able to detect the 100% CPU situation quite well. It is called by our monitoring each minute to inform us about the situation. If the situation stays for too long, affected machines would lock up completely due to more and more unkillable processes using 100% CPU, until the machine becomes completely unmanageable.
Currently the only way known to solve the problem is to manually hard reboot the affected machine.
/sbin/reboot
fails, because the machine hangs on shutdown quite too often.To hard-reboot a machine from any root shell commandline without direct access to Console do:
Keep in mind, do this after quiescing the machine, such that there is no more process writing to the disks. This shall prevent that
fsck
runs in severe trouble after reboot.Sorry, no real solution, but HTH. And keep in mind, perhaps there might be other things which cause a 100% CPU situation on kswapd than described here. So automating a reboot in this case perhaps is a bad idea.