Nginx php-fpm pool being blocked and stop responding

blockingnginxphp-fpmpooltimeout

I'm having some issues with requests for pages not getting a response after requests take a long time to process.

I have nginx setup to use php-fpm. I have two pools setup in PHP-FPM. One pool for general web page requests, one pool to serve up image and other large files.

From my php-fpm config file:

[www]
listen = var/run/php54/php-fpm-www.sock
pm = dynamic
pm.max_children = 20
pm.start_servers = 4
pm.min_spare_servers = 4
pm.max_spare_servers = 20
pm.max_requests = 200


[www-images]

listen = var/run/php54/php-fpm-images.sock

pm = dynamic
pm.max_children = 5
pm.start_servers = 1
pm.min_spare_servers = 1
pm.max_spare_servers = 2
pm.max_requests = 40

Nginx is configured to use these two separate pools, with requests for images stored in Amazon S3 going through the 'www-images' pool to be resized to the requested size. From my nginx config file:

location ~* ^/proxy  {
    try_files $uri @404;
    fastcgi_pass   unix:/opt/local/var/run/php54/php-fpm-images.sock;
    include       /opt/local/etc/nginx/fastcgi.conf;
}

location  / {
    try_files $uri /routing.php?$args;
    fastcgi_pass   unix:/opt/local/var/run/php54/php-fpm-www.sock;
    include       /opt/local/etc/nginx/fastcgi.conf;
}   

Because I'm testing on a terrible internet connection these requests are timing out in PHP, which is expected.

2013/01/20 15:47:34 [error] 77#0: *531 upstream timed out (60:
Operation timed out) while reading response header from upstream,
client: 127.0.0.1, server: example.com, request: "GET
/proxy/hugeimage.png HTTP/1.1", upstream:
"fastcgi://unix:/opt/local/var/run/php54/php-fpm-images.sock:", host:
"example.com", referrer: "http://example.com/pictures"

What's not expected and I'd like to resolve is that any requests that should be going to the 'www' pool are timing out with nginx not getting a response from PHP-FPM.

2013/01/20 15:50:06 [error] 77#0: *532 upstream timed out (60:
Operation timed out) while reading response header from upstream,
client: 127.0.0.1, server: example.com, request: "GET /pictures
HTTP/1.1", upstream:
"fastcgi://unix:/opt/local/var/run/php54/php-fpm-www.sock:", host:
"example.com"

After a couple of minutes requests to the 'www' pool start working again, without any action on my part.

I thought that using separate pools should mean that even if one pool has issues with requests taking a long time, the other pool should remain unaffected.

So my question is; how do I isolate the two pools, so that one pool being overwhelmed by requests that are timing out, doesn't affect the other pool.

To clarify, it is by design that I want to limit the number of requests that can be made at once through the 'www-images' pool. Although in practice this limit will hardly ever be reached (due to caching of the files downloaded from S3 to the server), if there is an unusual situation where that pool reaches it's limit, I want the www pool to continue functioning, as that is where the sites functionality actually sits.

Best Answer

I found two things:

  1. Add session_write_close(); to any long running PHP scripts as session data is locked to prevent concurrent writes only one script may operate on a session at any time.
  2. For any images that may be slow to load, make sure to load them from a different domain than the one you serve web pages and Ajax calls from, as web browsers will queue requests to the same domain when there are more than a small number of requests active.

Although these are two separate things, they had the same effect of making requests to the 'www' pool be blocked by requests to the 'www-images' pool.