It could be a dozen different things, and without a full overview of your entire server at that point in time, it isn't going to be possible to give a definitive answer.
It could be,
redis-server
process at 100% CPU
- Due to a flush to disk of in-memory data for persistence
- Due to key expiration during an eviction
- Due to a heavy process being executed
- Server side bottleneck
- If its a VPS/Cloud then it could be that another guest on the hypervisor is consuming resources, forcing yours to hang (tip. Don't use a VPS/Cloud if you want reliability and predictability)
- OOM condition due to swapping etc. or exchange of memory from buffers/cache to userspace
- High CPU load starving the Redis process
- No available TCP states (ie. table full)
- No available sockets
- No available file descriptors
- Multi purposing Redis
- Using the same Redis instance (not database) for both cache and sessions will cause this type of behaviour. If you are using Redis for cache too, then edit your init script to instantiate two separate demons on two different ports/sockets.
The list could go on and on. What you need is full and proper monitoring of your entire server, so that you can see exactly what is happening to make a diagnosis.
That includes graphing every application with Munin, historical logging with the Atop daemon, logging and centralisation of log data into a single dashboard.
New Relic is a nice tool, whilst useful for some identification of issues, its not necessarily the best tool to try and have full visibility of the cause of issues.
Easiest place to start would be to check if Redis needs more memory,
redis-cli
> info
Compare the peak memory with the limit you have defined.
NB. Your hosting provider should be able to answer this question in an instant, I would definitely suggest asking for their input.
I've seen this quite a lot on New Relic as well.
From what I've seen there are a few different causes, I don't have a complete understanding of this issue but it is something I've been looking into recently. Here's my findings.
Sessions in Magento, Locking, and New Relic
Every controller action in Magento uses the session, whether it needs to or not. The session is eagerly instantiated in Mage_Core_Controller_Varien_Action::preDispatch
If you have session locking enabled, this means that for the duration of the request your session is locked down until the request completes. I haven't found the bit of code that releases the session lock yet, but I'm pretty sure it's in there somewhere.
Ultimately this means if you fire off multiple concurrent requests to Magento controller actions from the one location using the same session, you will have to wait for some of those requests to complete and unlock the session to proceed. I usually see this as a slow transaction on new relic stuck at Mage_Core_Model_Session_Abstract_Varien::start
for ~30 seconds (my session lock wait timeout I think).
This report on New Relic has multiple downsides as I see it
- Slows down the total average response time, because these requests are slower than they otherwise should have been.
- New Relic records a sample of the slowest transactions, if I have performance bottlenecks that take for example 20 seconds New Relic will not report them automatically for me if the same URL is plagued by session locking timeouts. The timeouts are hiding the useful data.
Causes
I've seen a few common causes for this, not a definitive list by any means
Bots
Crawlers like Baidu and Yandex being a being a bit rude and battering the website. They're being run from one location firing off numerous requests, using the same session, and tripping up the session locking mechanism, hence showing slow transactions in New Relic.
Ajax calls to Magento controller actions
With varnished websites customer specific data must be loaded with care, some websites manage this by using ajax calls to the Magento backend to get the required data. I have also seen some websites using ajax calls to the backend to get product specific information, such as the amount left in stock when an item is on sale.
If a single page triggers multiple ajax calls to the backend on page load, it can potentially trigger the session locking mechanism. The more ajax calls back to the Magento backend the more likely you are to experience locking.
Varnish ESI
The same as above really, except instead of using ajax calls it uses Edge Side Includes which seem to be new calls to the backend.
My plan
I have not actioned this yet so it's still purely theoretical, but it's something i'm looking into doing over the next few months.
I brought this problem up during the Mage Titans UK 2016 conference and Fabrizio Branca pointed me towards the following module: https://github.com/AOEpeople/Aoe_BlackHoleSession.
Based on a regular expression the module will prevent Bots from creating real sessions, this should have the benefit that no session lock will be hit, and that your session resources won't be battered by rude bots. Bots should no longer pollute your New Relic readings.
For ajax/ESI calls to get customer data there on cached pages there's nothing you can do that I can see. You need access to the session in order to retrieve customer specific data.
However, for ajax/ESI calls to get catalog specific data (such as limited stock) I don't see any need for a session to exist on that request at all. My plan for the future is to trial out an extension to the Aoe_BlackHoleSession
module so that I can silo off requests to a specific URL as being sessionless.
I'm less familiar with the internals of ESI, so sadly I don't have too much to comment there.
An alternative
During the conference Fabrizio Branca said he was able to disable session locking completely without any ill effects, test at your own risk.
Best Answer
As per as my concept Redis is most good:
Memcached is Free & open source, in-memory key-value store, high-performance, distributed memory object caching system.
Redis is an open-source, networked, in-memory, key-value data store with optional durability.
Because of
Redis doesn't support LRU or any similar policy for handling overload Redis doesn't support CAS (check and set) which is useful for maintaining cache consistency - see What are the most common sources of Memcached cache inconsistency? (though there is a SETNX operation that makes this unnecessary)
More details: Stackoverflow "Memcached vs. Redis?"
Some details with Redis faster data support: Redis.io