Php – Memcache – Issues in a distributed environment with many nodes

cachememcachememcachedPHP

I've had a quick look through the other similarly titled questions and none are particularly similar to the issues I'm currently having.

Basically, we've had a multi node memcached ring running for over two years, and for the most part its been problem free. The memcache installation was moved recently onto dedicated servers and the capacity was tripled (2x 1GB to 2x3GB). At first we had troubles with what I believe to be issues with how the php libraries were talking to the servers, either issues with the ordering of the server list, or them being started incorrectly.

The servers 'appeared' to be working correctly, but keys seemed to be being stored on multiple servers and an expire wouldn't expire all instances of the value.

Basically, we changed the hashing mechanism from standard to consistent, and the problems with key lookups (and expires/gets) and everything seems to have returned to normal.

However, I've been monitoring things over the last few weeks and noticed the first server seems to be being hit many, many more times than the second (the PHP memcache monitor tool reports one averaging 1,200 hits a second, whilst the second is only at 500).

Can anyone explain:

  • Firstly, any idea of what is happening above, why one server would be getting so many more hits in a 'distributed' environment
  • Secondly, what are the recommended settings for memcache clients in a distributed situation
    • Am I doing the right thing using consistent hashing
    • Should I use failover?;
    • binary storage?;
    • or compression?
  • What is the correct procedure for resetting/moving a live memcache ring

I've found memcached to be such a fantastic tool, perfect for its purpose, but the actual best practice guides and useful documentation (very few describe it in any detail at all) are few and far between. If I can get some measure of insight into whats happening, I'll definitely post it as a tech article for all to see (to help in future), but I'm having trouble right now!

Thanks in advance

Best Answer

If your keys have unequal access patterns you will see unequal traffic to each memcached node. e.g. If you have 2 keys, one of which a is get/set 500 times per second and one b which is get/set 250 times per second then the node which contains a will have twice as much traffic as the node which contains b.

In my case, we had 8 memcached nodes with a few thousand keys. One of those keys was doing about 800 gets/sec at peak traffic and almost every other key was doing less than 1 get/sec. The memcached node which had the busy key exhibited significantly higher traffic than the others.

If you want to balance the traffic equally to each of your memcached nodes then you either need to:

  • Play games with your keying to make sure that your busy keys are spread out properly.
  • Switch to using repcached or Membase to replicate the keys across multiple nodes