Conflict Resolution for Two-Way Sync – Best Practices

algorithmsdatabasedatabase-developmentsynchronization

How do you manage two-way synchronization between a 'main' database server and many 'secondary' servers, in particular conflict resolution, assuming a connection is not always available?

For example, I have a mobile app that uses CoreData as the 'database' on the iOS and I'd like to allow users to edit the contents without Internet connection. In the same time, this information is available on a website the devices will connect to. What do I do if/when the data on the two DB servers is in conflict?
(I refer to CoreData as a DB server, though I am aware it is something slightly different.)

Are there any general strategies for dealing with this sort of issue?
These are the options I can think of:
1. Always use the client-side data as higher-priority
2. Same for server-side
3. Try to resolve conflicts by marking each field's edit timestamp and taking the latest edit

Though I'm certain the 3rd option will open room for some devastating data corruption.

I'm aware that the CAP theorem concerns this, but I only want eventual consistency, so it doesn't rule it out completely, right?

Related question: Best practice patterns for two-way data synchronization. The second answer to this question says it probably can't be done.

Best Answer

The usual solution for knowing "which change is correct" is a vector clock. You essentially keep track of counters for each repository that holds the data, and reject changes if a particular client's view of everyone else's state differs from that of the peer it is connecting to.

The big question that you have to answer is how you'll resolve rejected saves. This generally means some sort of merge operation.

Note that vector clocks do not use real-time timestamps. The problems involved in synchronizing real-time clocks is at least as difficult as synchronizing data.

Related Topic