Memcached scaling strategy

memcachedmemoryscaling

Currently I am running a production environment with 4 dedicated memcached servers, each of them having 48Gb of RAM (42 dedicated to memcache). Right now they are doing fine, but traffic and content are growing and will surely be growing next year too.

What are your thoughts on strategies for scaling memcached further? How have you done until now:

Do you add more RAM to the boxes until their full capacity – effectively doubling the cache pool on the same number of boxes? Or do you scale horizontally by adding more of the same boxes, with the same amount of RAM.

The current boxes can surely handle more RAM as their CPU load is quite low, the only bottleneck being memory, but I wonder if it wouldn't be a better strategy to distribute the cache, making things more redundant and minimizing the impact on the cache of losing one box (losing 48Gb of cache versus losing 96Gb). How would you (or have you) handle this decision.

Best Answer

I so want to know what it is you're moving that consumes over 100 GB of memory while not maxing out your NICs.

Memcache scales fairly linearly between machines, so the questions you have to ask are:

  • Is my system bus currently saturated?
    • This might not relate to CPU usage -- DMA transfers won't show that way
  • How expensive is the high-density memory versus a new box containing the increase amount of memory?
    • Full cost of rack space, power consumption, etc.
  • Do you see a fundamental difference between losing 25% of your cache 1% of the time and 12.5% of your cache 2% of the time? (Randomly chosen failure rate).

Scaling is 10% intuition, 70% measuring and adapting, and 20% going back and trying something else.

Load 'em up until they max out the weakest link or stop being cost-effective. They may or may not already be there.