We rely heavily on memcache and are serving a few billion requests per month. We have 5 memcache servers. Last night, we saw an 25% increase in our traffic. The graphs show that requests and data transfered by each memcache increased and made them crash. It started a chain reaction and each memcache server crashed one after another (Load per server increased).
We found no logs in syslog, messages, memcache log file (Verbose settings was off).
I have two questions:
-
How can I find out why exactly this happened. If load is an issue
for memcache, is there any documentation on how much a normal
memcache (running on decent config) can handle. How can I increase
this value. -
How can I ensure they never go down again. It eventually impacted our mysql servers and replication and impacted a lot of other related services. Do I need more memcache servers?
I started my memcache using this init.d script: http://pastebin.com/wfMnB4ta where ENABLE_MEMCACHE is YES in /etc/default/memcached
/usr/share/memcached/scripts/start-memcached: http://pastebin.com/LaUugXye
Thanks
Best Answer
I'm going to guess that you run version 1.4.5 or older.
Since you mention an increase in traffic, then a sudden exit:
If you ever experience a crash, the first thing to do is make sure you're on the latest stable release. If you still experience crashes, the best thing to do is to contact the actual mailing list or file a bug report with information, rather than get lucky with a maintainer seeing this via a twitter search.
Doing periodic upgrades to match the latest stable can help you avoid having your whole cluster crash in the future.