Nginx – PHP7 unexpected FastCGI record while reading response header from upstream

nginxphp-fpmphp7

I upgraded to PHP7 (PHP 7.0.14) on a production server (CentOS 6.8) two days ago. Now, I am getting following error in nginx (1.10.2-1) logs.

2017/01/20 08:20:04 [error] 7654#7654: *153301 upstream sent unexpected FastCGI record: 3 while reading response header from upstream, client: XXX.XXX.XXX.XXX, server: example.com, request: "GET / HTTP/1.0", upstream: "fastcgi://unix:/var/run/php-fpm/example.fpm.sock:", host: "www.example.com"

  1. We have multiple websites all running their individual php-fpm pool and this error comes on all websites at same time.
  2. Browser show "502 Bad Gateway" during this error on all websites.
  3. This error comes for 1-2 minutes and after that everything returns to normal automatically.
  4. It happened three times in a day at different times.
  5. There was no problem with PHP5.
  6. I have tried to blacklist all application cache folder in opcache

We have another server with similar setup that was upgraded to PHP7, it has no such issues.

How should I troubleshoot and find a solution to this problem?

UPDATE 1
Server Details
CPU: 2x Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
RAM: 256GB
OS: CentOS release 6.8
Kernal: 2.6.32-504.8.1.el6.x86_64
PHP: Using 7.0.14-3 from IUS repo
Nginx: 1.10.2-1

Server is used as a web server to run multiple sites running an popular open source PHP application. We use Nginx with php-fpm as backend. Each website has a sperate php-fpm pool and different sockets. PHP application is already compatible with php7 and the only change is upgrade to PHP7.

UPDATE 2

Nginx main config

user  apache;
worker_processes  auto;

error_log  /var/log/nginx/error.log alert;
pid        /var/run/nginx.pid;


events {
    use epoll;
    worker_connections  4024;
    multi_accept on;
}


http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;
    server_names_hash_bucket_size 256;
    server_names_hash_max_size 1024;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    access_log  /var/log/nginx/access.log  main;
    client_max_body_size 512M;
    ssl_protocols TLSv1 TLSv1.1 TLSv1.2;

    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;

gzip  on;
gzip_http_version 1.1;
gzip_vary on;
gzip_comp_level 6;
gzip_proxied any;
gzip_min_length 1000;
gzip_types text/plain text/css application/json application/javascript application/x-javascript text/javascript text/xml application/xml application/rss+xml application/atom+xml application/rdf+xmli font/ttf font/otf image/svg+xm;
gzip_buffers 16 24k;
gzip_disable msie6;

fastcgi_connect_timeout 120;
fastcgi_send_timeout 1200;
fastcgi_read_timeout 1200;
fastcgi_buffer_size 256k;
fastcgi_buffers 16 256k;
fastcgi_busy_buffers_size 256k;
fastcgi_temp_file_write_size 256k;
fastcgi_intercept_errors on;
keepalive_requests 10000;

    include /etc/nginx/conf.d/*.conf;
     # Load all vhosts !
    include /etc/nginx/sites-enabled/*.conf;
}

Individual nginx site template

server {
  server_name @@HOSTNAME@@ www.@@HOSTNAME@@;
  root "@@PATH@@";
  index  index.php index.html index.htm;
  add_header    Cache-Control  public;

  client_max_body_size 512m;

    access_log @@LOG_PATH@@/access.log;
    error_log @@LOG_PATH@@/error.log;


        location / {
                # This is cool because no php is touched for static content
                try_files $uri $uri/ $uri/index.php @rewrite /index.php$uri?$args;
        }
        location @rewrite {
                rewrite ^ /index.php;
        }

    location ~ \.php$ {
        send_timeout 1200;
        proxy_read_timeout 1200;
        proxy_connect_timeout 120;
        fastcgi_read_timeout 1200;
        fastcgi_pass    unix:@@SOCKET@@;
        fastcgi_index  index.php;
        fastcgi_param  SCRIPT_FILENAME  $document_root$fastcgi_script_name;
        include        fastcgi_params;
    }



    location ~* \.(js|css|png|jpg|jpeg|gif|ico)$ {
            expires max;
            log_not_found off;
            access_log off;
    }

    location ~* \.(html|htm)$ {
        expires 30m;
    }

    location ~* /\.(ht|git|svn|bak) {
        deny  all;
    }

        location ~ ^/sites/.*/files/styles/ {
                try_files $uri @rewrite;
        }

}

PHP FPM pool template

[@@USER@@]
listen = /var/run/php-fpm/@@USER@@.fpm.sock
listen.owner = nobody
listen.group = nobody
listen.mode = 0666
user = @@USER@@
group = @@USER@@
pm = ondemand
pm.max_children = 50
pm.process_idle_timeout = 300s
pm.max_requests = 5000
rlimit_files = 1024
request_terminate_timeout = 1200s
security.limit_extensions = .php
php_admin_value[session.save_path] = "@@HOME_DIR@@/_sessions"
php_admin_value[error_log] = "@@HOME_DIR@@/logs/www-error.log"

UPDATE 3
When problem occurs

Request 1

GET /moodle/ HTTP/1.0
User-Agent: Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)
Host: www.example.com

Received header

502 Bad Gateway
Server: nginx/1.10.2
Date: Wed, 25 Jan 2017 12:32:00 GMT
Content-Type: text/html
Content-Length: 173
Connection: close

Received content

<html>
<head><title>502 Bad Gateway</title></head>
<body bgcolor="white">
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.10.2</center>
</body>
</html>

Best Answer

Ok, all php-fpm pools are dead for several minutes at the same time, and the only thing that changed with php 5.6 => php 7. What changed with php7 ? What is global across all php-fpm pools ? The opcache. Would you be kind enough to provide your php.ini ? If not, please check your opcache configuration and verify at least these parameters:

zend_extension=opcache.so;
opcache.enable=1;  # on or off on your config ?
opcache.memory_consumption=64; # Too small for you ?
opcache.max_accelerated_files=2000; # maybe to small for you ?
opcache.force_restart_timeout="180"; # Oh!!! This is the time of your outage!!

change the force_restart_timeout from 180 to 120 , change opcache.log_verbosity_level to something >=3 and look if the outage is shorter than usual.. Then I suggest your review opcache runtime configuration and tune it correctly for your site.