How does Redis (or any typical distributed cache) handle replication conflicts

cachingdistributed-systemredis

Suppose you set up a Redis cluster with one master and two slaves. Two clients are connected to each of the slaves. Both clients make conflicting changes at the same time:

Client 1 sends HSET Contact:123 Phone "867-5309", Client 2 sends DEL Contact:123

What happens if these changes are replicated to Master at around the same time? Are they just applied to Master in the order they are received, then replicated back down?

What if transactions are used? Is the result eventually consistent, i.e. does Master resolve the conflict by applying the transactions in some order, then replicate the resolution down?

I don't expect perfect consistency from a distributed cache, but I do want to understand the fine points so that I use caching well. The application I'm working on uses the distributed cache for coordination among worker threads/processes. For example, when one worker processes an item, it puts a key in the cache with an expiration of 1 minute telling other workers not to process the same item. It's acceptable if two or three workers end up processing the same item, but this mechanism prevents infinite reprocessing.

Best Answer

Apologies for not knowing the official or real names. My hope is that someone comes along and edits my answer with the appropriate names.

There's few options :

  1. Latest is Bestest - assume the latest one is the most correct cache. Everything else is outdated or wrong. This works as long as all systems have their clocks in sync. Also, note that their is still a criteria being used here (of being the latest one) and everything else does the same thing, using a different criteria.
  2. Master Rules - There's an elected master (there's a plethora of algos just for leader election that I'm skipping to keep things brief). Every client checks in with the master and just copies over the cache if its out of sync. Some systems don't even bother checking and always just blindly overwrite their own cache with the master's version. When master fails, things get complicated.
  3. Consensus - Each computer knows all the other computers in the network. Periodically, one of the system calls out for a consensus check and everybody broadcasts their cache status. Each computer decides on its own which cache is the latest based on a pre-decided algorithm. Things get fun when some nodes in particular network loose connections with few others (but not all of them).
  4. Partial Consensus - Same as consensus but tends to be lax in its calculations. This is useful when nodes keep coming up & going down randomly(AWS Spot instances) and you're fine with eventually consistent caches.
  5. No Conflict - Computer assumes there can be no conflict. Whatever it has is correct and everybody else is go to hell. This is a bad idea as soon as you have more than one node.
Related Topic