Linux – nginx recv() failed (104: Connection reset by peer) while reading response header from upstream

linuxmagentonginxPHP

I have recently upgrade my magento from 1.5 to 1.9 and when ever I add a certain product to basket, I started to receive this error: 502 Bad Gateway

There were no log entries in the var/log/ folder:
enter image description here

So, I had a look at my nginx errors and I found the following entries in nginx-errors.log:

2015/04/09 10:58:03 [error] 15208#0: *3 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 46.xxx.xxx.xxx, server: dev.my-domain.co.uk, request: "POST /checkout/cart/add/uenc/aHR0cDovL2Rldi5zYWx2ZW8uY28udWsvdGludGktYmF0aC1wYWludGluZy1zb2FwLTcwbWwuaHRtbD9fX19TSUQ9VQ,,/product/15066/form_key/eYLc3lQ35BSrk6Pa/ HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fcgi-www-data.sock:", host: "dev.my-domain.co.uk", referrer: "http://dev.my-domain.co.uk/tinti-bath-painting-soap-70ml.html"
2015/04/09 11:04:42 [error] 15208#0: *13 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 46.xxx.xxx.xxx, server: dev.my-domain.co.uk, request: "POST /checkout/cart/add/uenc/aHR0cDovL2Rldi5zYWx2ZW8uY28udWsvdGludGktYmF0aC1wYWludGluZy1zb2FwLTcwbWwuaHRtbD9fX19TSUQ9VQ,,/product/15066/form_key/eYLc3lQ35BSrk6Pa/ HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fcgi-www-data.sock:", host: "dev.my-domain.co.uk", referrer: "http://dev.my-domain.co.uk/tinti-bath-painting-soap-70ml.html"
2015/04/09 11:05:03 [error] 15208#0: *16 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 46.xxx.xxx.xxx, server: dev.my-domain.co.uk, request: "POST /checkout/cart/add/uenc/aHR0cDovL2Rldi5zYWx2ZW8uY28udWsvdGludGktYmF0aC1wYWludGluZy1zb2FwLTcwbWwuaHRtbD9fX19TSUQ9VQ,,/product/15066/form_key/eYLc3lQ35BSrk6Pa/ HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fcgi-www-data.sock:", host: "dev.my-domain.co.uk", referrer: "http://dev.my-domain.co.uk/tinti-bath-painting-soap-70ml.html"
2015/04/09 11:12:07 [error] 15273#0: *1 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 46.xxx.xxx.xxx, server: dev.my-domain.co.uk, request: "POST /checkout/cart/add/uenc/aHR0cDovL2Rldi5zYWx2ZW8uY28udWsvdGludGktYmF0aC1wYWludGluZy1zb2FwLTcwbWwuaHRtbD9fX19TSUQ9VQ,,/product/15066/form_key/eYLc3lQ35BSrk6Pa/ HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fcgi-www-data.sock:", host: "dev.my-domain.co.uk", referrer: "http://dev.my-domain.co.uk/tinti-bath-painting-soap-70ml.html"

I have installed magento on a custom LEMP stack, here are the configurations:

This error only appears to be happening when I add a specific product to basket in my upgraded magento and every time the error occurs, I can see a core.XXXXX file (which is approx 350mb) in the public_html folder.

Any idea why my php-fpm is crashing like this? How can I find the cause and fix it?

Here's the last entries on my Linux (CentOS) server, when I run dmesg command:

php-fpm[14862]: segfault at 7fff38236ff8 ip 00000000005c02ba sp 00007fff38237000 error 6 in php-fpm[400000+325000]
php-fpm[15022]: segfault at 7fff38351ff0 ip 00000000005bf6e5 sp 00007fff38351fb0 error 6 in php-fpm[400000+325000]
php-fpm[15021]: segfault at 7fff38351ff0 ip 00000000005bf6e5 sp 00007fff38351fb0 error 6 in php-fpm[400000+325000]
php-fpm[15156]: segfault at 7fff38351ff0 ip 00000000005bf6e5 sp 00007fff38351fb0 error 6 in php-fpm[400000+325000]
php-fpm[15024]: segfault at 7fff38351ff0 ip 00000000005bf6e5 sp 00007fff38351fb0 error 6 in php-fpm[400000+325000]
php-fpm[15223]: segfault at 7fff8d1d5fd8 ip 00000000005c02ba sp 00007fff8d1d5fe0 error 6 in php-fpm[400000+325000]
php-fpm[15222]: segfault at 7fff8d1d5fd8 ip 00000000005c02ba sp 00007fff8d1d5fe0 error 6 in php-fpm[400000+325000]
php-fpm[15225]: segfault at 7fff8d1d5fd8 ip 00000000005c02ba sp 00007fff8d1d5fe0 error 6 in php-fpm[400000+325000]
php-fpm[15227]: segfault at 7fff8d1d5fd8 ip 00000000005c02ba sp 00007fff8d1d5fe0 error 6 in php-fpm[400000+325000]
php-fpm[15362]: segfault at 7fff3118afd0 ip 00000000005c0ace sp 00007fff3118afa0 error 6 in php-fpm[400000+325000]

I analysed the core dump with gdb and this is what I see for the first two frames: http://pastebin.com/raw.php?i=aPvB1sWv (doesn't make much sense to me)…

Best Answer

Such errors usually occurs when server is running out of resources, assuming that you're running most recent stable versions of php5-fpm:

  1. Check that php5-fpm has enough memory (there's no oom-killer killing the process)

  2. There's enough space on the disk

  3. Make sure to check the open file limits on the server. You're interested especially in the hard limit (-Hn):

    $ ulimit -Hn
    4096
    $ ulimit -Sn
    1024
    

Check number of currently opened file descriptors on the server:

    sysctl fs.file-nr
    fs.file-nr = 1440       0       790328

Modern servers are capable of handling many files, usually ulimits are set to unnecessary low values.

Then check nginx.conf, at the beginning there's something like:

    worker_processes 4;
    events {
      worker_connections 1024;
    }

If you're proxying request for each connection you'd need 2 file handles. Which means that in case of many connections you'd reach the limit quite quickly.

nginx has a worker_rlimit_nofile directive for restricting opened files per worker process (top level directive like worker_processes 4;):

    worker_rlimit_nofile    1024;

Just do the math and calculate how many opened file descriptors you'd need when all connections are used (a bit extreme case). Also consider all other services running on that server.