It could be a dozen different things, and without a full overview of your entire server at that point in time, it isn't going to be possible to give a definitive answer.
It could be,
redis-server
process at 100% CPU
- Due to a flush to disk of in-memory data for persistence
- Due to key expiration during an eviction
- Due to a heavy process being executed
- Server side bottleneck
- If its a VPS/Cloud then it could be that another guest on the hypervisor is consuming resources, forcing yours to hang (tip. Don't use a VPS/Cloud if you want reliability and predictability)
- OOM condition due to swapping etc. or exchange of memory from buffers/cache to userspace
- High CPU load starving the Redis process
- No available TCP states (ie. table full)
- No available sockets
- No available file descriptors
- Multi purposing Redis
- Using the same Redis instance (not database) for both cache and sessions will cause this type of behaviour. If you are using Redis for cache too, then edit your init script to instantiate two separate demons on two different ports/sockets.
The list could go on and on. What you need is full and proper monitoring of your entire server, so that you can see exactly what is happening to make a diagnosis.
That includes graphing every application with Munin, historical logging with the Atop daemon, logging and centralisation of log data into a single dashboard.
New Relic is a nice tool, whilst useful for some identification of issues, its not necessarily the best tool to try and have full visibility of the cause of issues.
Easiest place to start would be to check if Redis needs more memory,
redis-cli
> info
Compare the peak memory with the limit you have defined.
NB. Your hosting provider should be able to answer this question in an instant, I would definitely suggest asking for their input.
Best Answer
You're likely hitting
max_concurrency
. Colin Mollenhour has made significant updates to the Redis integrations (and his Redis Session integration) in his Github repo modules.I suggest you upgrade to the most recent version, which batches SUNIONs to avoid massive wait times.
Aside from this I've seen
max_concurrency
hit while bots are crawling. Because of this we've, on occasion, made updates to the Bot settings in Collin's module.See here for more information:
https://github.com/colinmollenhour/Cm_RedisSession#bot-detection