I am managing a server that has a couple dozen websites on it and they have all been working fine until last week when it was noticed that one site had seemingly lost the ability to maintain session data. Then another. (I am guessing it is affecting all sites on this server but just has not been reported yet.) I changed absolutely nothing in either site's configs recently. I have added no software to the server recently. I have not changed the general nginx or php-fpm configs. There are no errors in the nginx or php-fpm error logs that correspond to this failure. Restarting php-fpm appears to clear up the problem at least temporarily. Inevitably, the problem recurs. How is it possible that php-fpm can fail like this without producing an error message somewhere? I have been googling extensively and have not found anyone else with this problem.
The server is running RHEL 6 with nginx and php-fpm (remi repo). I can't remember if this server is running APC but I don't think it is. All patches are up to date.
I am guessing I just have hit some sort of threshold where the current php-fpm configs are insufficient, though I don't understand why I am getting no errors when that limit is reached. Here are what I suspect are the relevant php-fpm settings…
pm = dynamic
pm.max_children = 50
pm.start_servers = 5
pm.min_spare_servers = 5
pm.max_spare_servers = 35
php_admin_value[error_log] = /var/log/php-fpm/www-error.log
php_admin_flag[log_errors] = on
Is there an error log somewhere I' missing where this would be reported? As I mentioned, there is nothing in /var/log/php-fpm/www-error.log, or in the general nginx error log or in the site-specific nginx error logs.
P.S. : I do get other kinds of error messages in all of the logs I mentioned so the lack of error messages is not a permission issue.
Here are df outputs (edited to remove identifying physical paths)…
# df -h
Filesystem Size Used Avail Use% Mounted on
xxx
8.4G 3.8G 4.2G 48% /
xxx 7.8G 0 7.8G 0% /dev/shm
xxx 477M 79M 373M 18% /boot
xxx
976M 713M 213M 78% /home
xxx
976M 30M 896M 4% /tmp
xxx
9.8G 4.6G 4.7G 50% /var
# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
xxx
547584 87083 460501 16% /
xxx 2041821 1 2041820 1% /dev/shm
xxx 128016 50 127966 1% /boot
xxx
65536 19285 46251 30% /home
xxx
65536 173 65363 1% /tmp
xxx
655360 19441 635919 3% /var
And here is the php-fpm status page while the site is not allowing sessions to be saved…
pool: www
process manager: dynamic
start time: 06/Aug/2015:10:53:06 -0400
start since: 332263
accepted conn: 2899
listen queue: 0
max listen queue: 0
listen queue len: 128
idle processes: 9
active processes: 1
total processes: 10
max active processes: 9
max children reached: 0
slow requests: 0
Best Answer
Because whoever wrote the failing code didn't check the failure and cause the program to write an error message. Programs aren't magic; they're written by humans who don't always anticipate every possible problem.
My intuition is that you've hit a disk storage limit somewhere; disk space, inodes, whatever. The solution is to either run something like
tmpreaper
over your session store regularly to keep the number of old sessions to a minimum, or else switch to using another (auto-expiring) session store like memcached.