Memcached failover

memcachednet

We have 2 memcached servers configured and use the Eniym client. When one of the server is down, it appears this server is added to the deadServers list (ServerPool.cs) and tries to resurrect the server every 10seconds (we have configured deadTimeOut to be 10seconds). Attempting to connect to the failed server causes a TCP timeout, the pages take a long time to load which results in bad user experience.

1) What is the standard way of resolving this issue? There are some posts about removing the server from the deadServers list. Is it okay to do this?

2) What is the recommended deadTimeOut setting (I understand by default it's 2 mins and we've changed it to 10seconds in our implementation)

3) Am I correct in my understanding that the cached data is not replicated across Server 1 and Server 2? If Server 1 is down, then it goes to the database to fetch these values (and it doesn't really check Server2)?

Any help is really appreciated.

Best Answer

  1. As a general rule, it's normally expected that you just accept that the cache may or may not have what you want.
    • It depends on the scenario, but it sounds like you might benefit from a higher one. There's no great loss having it higher (2-5 minutes).
    • Yes. Memcache will usually cache the values again on Server 2 (after fetching from the DB because Server 1's cache is unavailable).

You probably also lower your TCP timeout being used to reconnect to the possibly-dead server.