Nginx – Why does the server running nginx/php-fpm keep losing session capability without generating any errors

loggingnginxphp-fpmsession

I am managing a server that has a couple dozen websites on it and they have all been working fine until last week when it was noticed that one site had seemingly lost the ability to maintain session data. Then another. (I am guessing it is affecting all sites on this server but just has not been reported yet.) I changed absolutely nothing in either site's configs recently. I have added no software to the server recently. I have not changed the general nginx or php-fpm configs. There are no errors in the nginx or php-fpm error logs that correspond to this failure. Restarting php-fpm appears to clear up the problem at least temporarily. Inevitably, the problem recurs. How is it possible that php-fpm can fail like this without producing an error message somewhere? I have been googling extensively and have not found anyone else with this problem.

The server is running RHEL 6 with nginx and php-fpm (remi repo). I can't remember if this server is running APC but I don't think it is. All patches are up to date.

I am guessing I just have hit some sort of threshold where the current php-fpm configs are insufficient, though I don't understand why I am getting no errors when that limit is reached. Here are what I suspect are the relevant php-fpm settings…

pm = dynamic
pm.max_children = 50
pm.start_servers = 5
pm.min_spare_servers = 5
pm.max_spare_servers = 35
php_admin_value[error_log] = /var/log/php-fpm/www-error.log
php_admin_flag[log_errors] = on

Is there an error log somewhere I' missing where this would be reported? As I mentioned, there is nothing in /var/log/php-fpm/www-error.log, or in the general nginx error log or in the site-specific nginx error logs.

P.S. : I do get other kinds of error messages in all of the logs I mentioned so the lack of error messages is not a permission issue.

Here are df outputs (edited to remove identifying physical paths)…

# df -h
Filesystem            Size  Used Avail Use% Mounted on
xxx
                      8.4G  3.8G  4.2G  48% /
xxx                   7.8G     0  7.8G   0% /dev/shm
xxx                   477M   79M  373M  18% /boot
xxx
                      976M  713M  213M  78% /home
xxx
                      976M   30M  896M   4% /tmp
xxx
                      9.8G  4.6G  4.7G  50% /var


# df -i
Filesystem            Inodes IUsed   IFree IUse% Mounted on
xxx
                      547584 87083   460501   16% /
xxx                   2041821    1  2041820   1%  /dev/shm
xxx                   128016    50  127966    1%  /boot
xxx
                      65536   19285 46251     30% /home
xxx
                      65536   173   65363     1%  /tmp
xxx
                      655360 19441  635919    3%  /var

And here is the php-fpm status page while the site is not allowing sessions to be saved…

pool:                 www
process manager:      dynamic
start time:           06/Aug/2015:10:53:06 -0400
start since:          332263
accepted conn:        2899
listen queue:         0
max listen queue:     0
listen queue len:     128
idle processes:       9
active processes:     1
total processes:      10
max active processes: 9
max children reached: 0
slow requests:        0

Best Answer

How is it possible that php-fpm can fail like this without producing an error message somewhere?

Because whoever wrote the failing code didn't check the failure and cause the program to write an error message. Programs aren't magic; they're written by humans who don't always anticipate every possible problem.

My intuition is that you've hit a disk storage limit somewhere; disk space, inodes, whatever. The solution is to either run something like tmpreaper over your session store regularly to keep the number of old sessions to a minimum, or else switch to using another (auto-expiring) session store like memcached.