How is distributed memcached supposed to increase performance if it’s making network calls


Greetings all, I love memcached but have thus far only used it in single machine setups as a local cache. I have read extensively about memcached's distributed nature and how the clients determine which in a list of memcached servers to write to and read from. From my understanding, by picking a deterministic hashing algo, we can ensure that data is always written to and read from the correct server, despite where the request came from.

So, my question is, consider the following situation:

Server A in New York, Server B in Los Angeles. Both are mirrors of each other. Both are running mysql databases with replication. It doesn't really matter in the case of reads only, but lets say A is the master and B is the slave. Both are running memcached and their clients have a list of the memcached servers (Servers A and B in this case).

A certain piece of data, say the body of a blog post, is read from the database on Server A and is consequently stored in A's memcached. A different user in a different part of the country hits Server B and requests that same blog post. Server B's memcached client checks and sees that indeed, this data has been cached, so it reaches OVER THE NETWORK to grab that data from Server A's memcached.

Now, first of all, is my understanding so far correct? Please point out any errors or incorrect assumptions I have made :).

So, my question is, how is this supposed to improve performance? It seems that a better plan in this case would be just to have Server A and Server B both running their own separate memcached instances as a local cache (the top figure in, but this goes against the whole idea of the distributed design. So what is the advantage of it being distributed? A network operation from Server B to Server A is way slower than Server B reading from it's own local database.

Please help me understand. I feel like there is something I am fundamentally missing here about how memcached works.

Thanks! K

Best Answer

The short answer to your question is: It's not.

Distributed memcached makes sense where your system is able to retrieve valid answers from cache, rather than having do expensive lookups/computation to get the correct answer.

In the case of memcached, talking across the internet with latencies of maybe 60-100ms or more, there is really nothing to be gained. Chances are your system will be able to lookup/compute correct answers much faster than it can find the correct answer in a cache halfway across the internet.

You need gigabit (or faster) network between memcached nodes to gain any performance benefits. Your setup is designed for fail-over and geographic-based performance. If your servers A and B were really A1, A2 and B1, B2 memcached might be for you.

Related Topic