Nginx – Intermittent error 404 during higher loads. (nginx/php/apc/thesql)

alternative-php-cachenginxPHPphp-fpmphp5

I have a 4 cpu core VPS with 4GB ram. It is running Ubuntu 10.04 with nginx/0.7.65, PHP 5.3.2, and MySQL. I use php5-fpm for the communication to nginx. I also have php5-apc complimenting with php caching.

This system runs four sites and shows approximately 2.5 million page views per day.

Introduction to problem – The sites will intermittently show error 404 when loading a page. It potentially happens on every page, and I can sit there and refresh over and over and it will happen on average once in every seven to ten times.

What I've tried – Updating Ubuntu and nginx, playing with php and nginx configurations. this problem has been replicated on Ubuntu 11.10 and nginx/1.0.5.

Notable logs – Nginx error.log and access.log show nothing out of the ordinary, but do report when a 404 is issued. php5-fpm.log displays:
Nov 20 21:47:45.640003 [ERROR] [pool www] unable to retrieve process activity of one or more child(ren). Will try again later.
but I am not convinced that is related to this issue, as i've seen it in previous setups.

Configurations (links to images to save space) –
* APC settings, note Fragmentation: 0% and Cache full count is 0. http://i.imgur.com/fV8hU.png
* nginx.conf

user www-data;
worker_processes  4;

error_log  /var/log/nginx/error.log;
pid        /var/run/nginx.pid;

events {
    worker_connections  1024;
    use epoll;
# multi_accept on;
}
http {
include       /etc/nginx/mime.types;
server_names_hash_bucket_size 64;
access_log off;  #/var/log/nginx/access.log;

sendfile        on;
#tcp_nopush     on;

keepalive_timeout  0;
#keepalive_timeout  65;
tcp_nodelay        on;

gzip  on;
gzip_disable "MSIE [1-6]\.(?!.*SV1)";

include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;
}
  • typical site configuration included in nginx.conf http://i.imgur.com/9Vmhj.png
  • my php5-fpm.conf is default, except pm.max_children = 30 from pm.max_children = 10.
  • my php5 config is default, only edits are sendmail/postfix related
  • ulimit -n shows 1024

What helps – The only relief I have had from this is by moving one of the two busier sites to its own VPS. Obviously I would love to have all the sites on one vps, as the system has more than enough resources. Am I hitting open file limit somewhere? I'm suspicious it may have something to do with something php related.

Pretty confident that this is a config issue somewhere. Not confident what it could be though, and have spent 48 hours attempting to research before posting this question.

Best Answer

Upgrading to PHP >=5.3.4 should help, as it seems you've been bitten by this bug: https://bugs.php.net/bug.php?id=53028