Architecture – Synchronization between two systems using MongoDB as changelog

Architecturedatasynchronization

We are developing two related systems. One of them (A) will be installed on our customers' machines. The remaining (B) will be used by my organization.

Each system has its own database (relational) and their schemas differ. However, both systems have to be synchronized. In addition, some changes in B has to be exported to all class A systems and other only to a specific one.

Some customers don't have an Internet connection so the synchronization, in some cases, has to be done via exchanging files.

So, we are planning to resolve this problem as following:

  1. Each system maintains a changelog of its database. We are planning to implement it with MongoDB.
  2. When a system initializes a synchronization process, it retrieves all made changes from a log. If the system is B, the changes retrieved depend on the destination. Then, the system serializes them in XML format and, finally, sends them (via a file or a network).
  3. When the other endpoint receives the changeset, it unserializes them. Then, the system makes some transformations over the data, which can be necessary, and finally, records the changes. In this step, if it's necessary, the system has to resolve the conflicts which might exist.
  4. Last, the receiver system sends its changes (and other products of conflict resolution).

Is this approach feasible, scalable and elegant? What changes or additions would you make?

Best Answer

If you have not already done so you may find it interesting to read up on event-driven systems, event sourcing and eventual consistency. The system you are describing has many parallels with these patterns, which is a good thing.

Your approach sounds good, in particular:

  • The use of an ordered changelog means that the synchronization process is able to retrieve only changes made since the last seen change. This will keep the processing time down which helps scalability and will allow you to build near-real-time synchronization in the cases where internet connectivity is available.
  • Customers without internet connection forces you to think about dealing with delayed and out-of-order synchronization now, rather than relying on fast synchronization and inadvertantly ending up with scalability issues.

Without knowing more about the domain model my guess is that resolving conflicts is the part that will cause you the most trouble. I would spend some time thinking through how each sort of conflict would be resolved. In particular:

  • Will some conflicts require user resolution?
  • Is the customers system always going to be the correct place to resolve conflicts?
  • Is it possible for there to be conflicts in system B after step 4 when the customers system sends its changes?
Related Topic