Php – High CPU usage resulting in server crash

apache-2.2cpu-usagelampPHP

running out of ideas to explore. First off – let me warn you – I'm a programmer, not a systech 🙂

Here is the situation.

Dedicated server (LAMP) running a fair amount of sites. mySQL server is on a seperate box.

Last couple weeks, performance has been steadily degrading to the point where I can no longer even remote into the box.

Looking into mod_status, there are a fair amount of processes taking up CPU resources. However, the URLs are all different… there is not a common pattern – so I can't narrow anything down to a particular script that might be getting stuck.

PHP is ran as cgi.

Majority of the sites that are taking a while to run are using the cakephp framework

Restart the server, we are down within a few minutes again…

Crossed an error that said /var/tmp/ was full and couldn't write sessions. However, there was still room? Lack of inodes perhaps? Currently in the process of having someone walk down to the box and clear tmp.

Could the lack of ability to write sessions be causing the php processes to hang forever, and eventually clog everything up?

Any other ideas that I might want to explore? I have been monitoring the sql server to see if it is returning huge datasets in any of the queries, and there is nothing notable in there….

It's only 11:21am here and I already need a drink 🙂

Best Answer

I assume its a memory problem.

  1. Apache is eating a lot of RAM.

  2. PHP also has a lot of memory leaks. You should configure it to restart its worker threads after handling some low amount of request (100 is a good number). Look in /etc/init.d/php-cgi (or similar) for a line "PHP_FCGI_MAX_REQUESTS=20" ... that the limit. Also set a reasonable limit for the number of children like "PHP_FCGI_CHILDREN=15". I would also suggest you to use php-fpm if possible, thats much more stable and has less memory leaks.

TODO:

  1. Try to look for killed processes in your syslog (/var/log/syslog or /var/log/messages depending on distribution). There might be such a hint.
  2. To track the problem down, try to use "atop" (process monitor like top, but some more features) and press "p", that accumulates all statistics by process names. Have a look at what's eating up the RSIZE.