Debian – How to automatically resume php-fpm


I am using nginx+php-fpm on Debian Squeeze for a busy server and have had great difficulty to deal with maximum connections being reached. Here the problem is that php processes sometimes just die randomly under high load and leave the server with no php process. Then I need to manually restart php5-fpm service to bring back the server to life.

I am wondering how to avoid this to happen, or at least treat the symptoms by restarting the php5-fpm automatically whenever there is not php process left to listen to incoming requests. My relevant configs are:

pm = dynamic
pm.max_children = 1400

pm.start_servers = 10
pm.max_spare_servers = 20
pm.process_idle_timeout = 1s; #not sure it will be useful when pm=dynamic
pm.max_requests = 100000
request_terminate_timeout = 30

I appreciate your suggestions to cope with this nasty problem.

Best Answer

The old watch dog script idea huh? Not the most elegant way to solve your problem but it can temporarily remedy the situation until you can figure out why its happening in the first place.

The actual problem needs to be addressed, It is either that the server needs to be more finely tuned, or that the server is not powerful enough to handle the load in the first place.

You have determined that the process actually dies. In which case it is as simple as determining if the process still exists. ps aux should do it for you.

for example:

ps aux|grep php-fpm|grep -v grep|awk '{print $2}'

should output the process id of php-fpm. If it does not exist it needs to be restarted

so something along the lines of this should do the trick. (short and simple)

pid=`ps aux|grep php-fpm|grep -v grep|awk '{print $2}'`
if [ $pid == '' ]
service php-fpm restart

That script would be run as a crontab every minute. and it has NOT been debugged. so experiment with it and make sure it is working.

The problem with doing that with zombie processes is they actually exist and are "running" but not physically doing anything. in which case they would need to be killed first and then the process restarted.

Again the correct thing to do is to determine what is actually causing the service to crash. The watchdog script idea is only to buy you time.

Hope it helps. Good luck