Using ESB for database synchronisation / replication

data-replicationenterprise-architectureetl

We're starting to look at implementing an ESB / Microservices architecture. I (think) I know about the concepts, but there's one thing I don't seem to be able to get a good idea about: data replication / synchronisation.

Creating an event for each and every table (maybe even multiple (create,update,delete)) seems like overkill to me, if it's just to synchronize data. Wouldn't an ETL / SQL Replication solution be much easier, in cases where no business logic will be exectute, as it's just to update the local cache/db of the server?

What strategies would you advise?

Simple example, we have an application that manages all product data, we want to build an API (Web Service) that is going to serve a Mobile App that will display that data.

There are several options:

  1. API directly accesses the database
  2. API has a local database that's being kept up to date using ESB messages
  3. API has a local database that's being kept up to date using some replication tool
  4. API has a local database that's updated once a day using a batch operation.

In my opinion the reason to use messages would be to further uncouple the systems as the database structure behind it can be changed without affecting the system in that scenario. For all other means and purposes 3/4 seem much let complex, which in my opionion meets the KISS principle.

What would you advice? Where can I get some sort of flowchart/decision tree example on what alternative to use when?

Best Answer

You need to look at what the root problem is.

  • Are you seeking data redundancy?

  • Are you seeking minimal data access times?

  • Are you seeking sharing data across separate environments?

  • Are you seeking to minimize security vulnerabilities with access to the data?

Once you decide what the highest priority is, then you can work on finding the best solution.

For data redundancy and access times the most likely solution is using SQL replication. The goal of replication (which those products are very good at) is data redundancy & minimizing access times through slave database servers. This option allows all points to have access to nearly the same data, you just need to monitor replication lag to ensure it stays within business requirements.

For separate environment concerns, I believe either an ESB or automated batch operation is preferable. This enables additional manipulation to occur before/during the operations to ensure any variances between servers can be resolved at data import time.

To minimize security vulnerabilities I would recommend a non-automated batch operation that has an appropriate level of security checks to ensure data validity. By not automating this operation it allows a human to confirm that there are no outstanding issues that may cause data corruption.

For any decision, big or small, you need to do a cost-benefit analysis before implementing any changes. You need to take into consideration:

  1. Development time
  2. Development cost
  3. Future use of the solution
  4. Expected performance
  5. Complexity of the solution
  6. Maintainability of the solution