Using ESB for database synchronisation / replication

data-replicationenterprise-architectureetl

We're starting to look at implementing an ESB / Microservices architecture. I (think) I know about the concepts, but there's one thing I don't seem to be able to get a good idea about: data replication / synchronisation.

Creating an event for each and every table (maybe even multiple (create,update,delete)) seems like overkill to me, if it's just to synchronize data. Wouldn't an ETL / SQL Replication solution be much easier, in cases where no business logic will be exectute, as it's just to update the local cache/db of the server?

What strategies would you advise?

Simple example, we have an application that manages all product data, we want to build an API (Web Service) that is going to serve a Mobile App that will display that data.

There are several options:

API directly accesses the database
API has a local database that's being kept up to date using ESB messages
API has a local database that's being kept up to date using some replication tool
API has a local database that's updated once a day using a batch operation.

In my opinion the reason to use messages would be to further uncouple the systems as the database structure behind it can be changed without affecting the system in that scenario. For all other means and purposes 3/4 seem much let complex, which in my opionion meets the KISS principle.

What would you advice? Where can I get some sort of flowchart/decision tree example on what alternative to use when?

Best Answer

You need to look at what the root problem is.

Are you seeking data redundancy?
Are you seeking minimal data access times?
Are you seeking sharing data across separate environments?
Are you seeking to minimize security vulnerabilities with access to the data?

Once you decide what the highest priority is, then you can work on finding the best solution.

For data redundancy and access times the most likely solution is using SQL replication. The goal of replication (which those products are very good at) is data redundancy & minimizing access times through slave database servers. This option allows all points to have access to nearly the same data, you just need to monitor replication lag to ensure it stays within business requirements.

For separate environment concerns, I believe either an ESB or automated batch operation is preferable. This enables additional manipulation to occur before/during the operations to ensure any variances between servers can be resolved at data import time.

To minimize security vulnerabilities I would recommend a non-automated batch operation that has an appropriate level of security checks to ensure data validity. By not automating this operation it allows a human to confirm that there are no outstanding issues that may cause data corruption.

For any decision, big or small, you need to do a cost-benefit analysis before implementing any changes. You need to take into consideration:

Development time
Development cost
Future use of the solution
Expected performance
Complexity of the solution
Maintainability of the solution

On the topic of "every attribute change"

I think you missed the point. Mr. Udi Dahan is saying you should capture the user's intent as a command. An end-user is concerned with being able to indicate that a customer has moved. Depending on the context that command could contain a customer identification, the new address (split up into street, streetnumber, zipcode, ...), optionally a new phone number (not uncommon when you move - maybe less so with all these cellphones). That's hardly one attribute. A better question is "how do I design commands?". You design them from a behavioral perspective. Each use-case, flow, task an end-user is trying to complete, will be captured in one or more commands. What data goes with those commands comes naturally, as you start reasoning about them in more detail. The thing to watch out for is data that gets interpreted as "logic flow control" on the server side. That might be an indication that you need to split up the command. I hope you never find that standard with regard to command granularity. Good question though!

How to Do Data Synchronization Between Two Applications

This is very bad, I've drawn a diagram to understand the situation better:

                                   (Foobar: XML/FTP/ZIP)
 ------<-----------> (A)<-|----------------------------------------> (C)
 | db |                   |
 ------<-----------> (B)  |

Its pretty clear that "Foobar" needs to somehow directly access the db. The two options are:

Bypass (A) and directly access the db
Application (A) give you the API (web service, RESTful, SOAP,etc) to the db

I believe the first option is unlikey (but talk to your clients).

This leaves option two, which means the burden is on your client to develop the API to allow you to access their db.

Of course you can maintain an "offline" transaction db which can used to store transactions when the connection is down between "Foobar" and then sync when its up again, but this is optional.

I hope that helps.

EDIT:

Propose this solution:

 ------<-----------> (A)  |
 | db |                   |
 ------<-----------> (B)  |  
    ^                     |         (WEB SERVICE)
    |______________> (W)<-|------------------------------ (C) (ASP.NET Store Front)

They will need to host the Web Services that exposes access to their db . This is the standard practice.

Now who builds the web services is for you and your client to decide.

Best Answer

Once you decide what the highest priority is, then you can work on finding the best solution.

Related Solutions

Architecture – How granular should a command be in a CQ[R]S model

On the topic of "every attribute change"

How to Do Data Synchronization Between Two Applications

Related Topic