Nginx + php-fpm – Each php-fpm process 70-100% cpu when running

cachecentral-processing-unitnginxphp-fpmvps

I have a situation in which the following is taking place:

  • We are on linode with 8-core, 8gb of ram , 2.6 ghz – using nginx + php-fpm – we are getting extremely high graphs of cpu usage (which we don't want to be such a bad VPS neighbor)…

  • We have around less than 100 users on the site at a time – so this situation is also incredibly embarassing – that our cpu usage is very high.

  • We are using a very unknown, possibly cpu intensive php-wise, questionably horrible Framework instead of well-known, well-documented, well-crafted other frameworks like wordpress or drupal in which there is LOTS of documentation about caching (as well as plugins that handle caching) php on a nginx + php_fpm platform.

  • Thus, we have about 6 open php-fpm processes that when RUNNING, consume individually LARGE (30+, and often near 99%) amounts of cpu – and I haven't really the slightest idea how to stop them from using so much cpu. I can't tell which php scripts are causing these spikes because they are happening all the time… usually only 1 or 2 are running – but when all 6 run we maximize all 8 cpus.

  • My pool.d/www.conf file has the following settings:

    pm = dynamic
    pm.max_children = 10
    pm.start_servers = 4
    pm.min_spare_servers = 2
    pm.max_spare_servers = 6
    
  • We did this ^ setup because, in the way that I am interpretting it, our memory is actually amazing (htop showing 472/7000+mb used, no swapping etc) and we could handle many more processes and break down the line waiting to get processed – BUT unfortunately, since each process is too intense on our cpu when running – we end up driving our CPU through the roof – so we can't handle enough processes.

  • The question – what on earth can we do to reduce the process php-fpm cpu usage so that we can increase the settings in that pool conf file for php-fpm – and also yes, the /var/log/php5-fpm.log is yelling at us to increase our children and adjust/increase our min/max/start servers. But doing so makes our load average crazy as previously stated. How can we do so without necessarily using a cache or what are our options?

  • My idea? I've read things about using cpulimit to ensure no process takes more than an allotted amount of cpu – but will that slow things down to be unusable? Or in doing so we could increase our ability to run more than a few processes – I also thought running two pools – one for our forward facing website (what customers experience) and another for a backend (which is affecting our forward facing site when time-consuming reports are being ran).

  • I have been spending a few days researching, googling, etc on this topic – and it is difficult because everyone's situation is so unique to their system – the trouble is being on such a specific unheard-of, possibly poorly written – framework – is making it hard to find a solution. We can't just scrap this framework just yet either – I have to find a solution of some sort.


UPDATE: I have implemented memcache to store php sessions – because the framework heavily relies on user sessions and the nature of our system is that employees often use several tabs at a time – each checking back to the sessions to confirm abilities / user data / etc… so I am hoping to see some increase in performance from this – welcome to comment on that if you'd like – I'll see how it goes tomorrow when we get through our higher volume peak times.

Best Answer

A couple of things to consider (apologies in advance if you have already considered these): First of all, make sure to optimize your nginx config and invoke php-fpm only when absolutely necessary. The last thing you want to do is let php handle things like static HTML pages (which it will happily do).

Secondly, since you're using php-fpm, I suggest to be more aggressive with how long php-fpm's children are allowed to live. You need to find the sweet spot between shortly lived threads/children and stability. The php-fpm defaults are way too generous for any production system, IMHO. The longer a worker is allowed to serve requests, the more unstable it will get. There's also a higher risk of memory leaks, and if this framework you refer to has bugs like infinite loops, which may be causing you grief with CPU load, this shouldn't hurt.

I'd reduce the number for pm.max_requests for your production pools. I think the default is 200. I'd start from 50 and see where that takes you.

Failing/complementary to that, you could also try these global options (AFAIK they are all disabled by default):

emergency_restart_threshold 3
emergency_restart_interval 1m
process_control_timeout 5s

What does this mean? If 3 PHP-FPM child processes exit with SIGSEGV or SIGBUS (i.e. crash) within 1 minute then PHP-FPM is supposed to restart automatically. The child processes waits 5s for a reaction on signals from master.

Here's a nice overview of all the config options I mentioned here, as well as others: http://myjeeva.com/php-fpm-configuration-101.html

Hope these tips help you! Remember to tweak and observe, unfortunately there doesn't seem to be a rule of thumb for all this, as you observed, there are too many variables that affect PHP's behaviour and stability.

Finally, the CPU limiting facility you inquired about is documented here, but I'd only resort to it if you exhaust every other option. If you do choose this path, I'd definitely watch out for possible interactions between PHP-FPM tweaks and your limits.conf configuration. At that point etckeeper may be a lifesaver! :)

Good luck!

Rouben

Related Topic