When should a ESB vs ETL tool be used? I have worked on ESB projects using Tibco Businessworks quite a few years ago. The message bus that we built used to consume messages from a source system , applying validation , perform transformation and send it to appropriate destination. The source could be a db , service , file etc and same with the destination. I was under the impression that ETL is primarily to extract , transform and load large volume of data & source / destination are primarily different databases. I recently looked at one pentaho project and it pretty much supported all the functions that I have seen in Tibco Businessworks and the project that I looked at was performing data extraction from a DB , but doing the same activities such a filter / transformation and routing. I also believe it has facility to route messages and consume messages from queues. When should we choose one over the other?
Architecture – Difference Between ESB and ETL
Architectureetlservice-bus
Related Solutions
Well...just focusing on the practical aspects of the general ETL problem, without too much additional effort you could separate the backing logic into one or more reference-able assemblies, and create a simple windows service to call into your ETL process in an identical way.
It's trivial to set up a windows service application to also be runnable via console (via a command line argument) so you don't lose the ability to run it as a console if you want to.
In return you gain the ability to run the ETL process as a windows service; using a System.Threading.Timer or scheduling mechanism of your choice within the windows service object gives you an automated / scheduled solution that should rarely require manual intervention.
If your ETL process is working already as a console app, it should only take you somewhere between a few hours to a couple of days (depending on how familiar you are w/ windows services and if you want to embellish the solution w/ extra bells and whistles) to evolve it into a windows service and start testing it.
As the actual ETL logic itself is independently available from this means of running it you can also drive the logic from other applications, such as a workflow or from one or more GUI's.
Console apps are handy for convenient "utility" dev tools, one off batch process, command line style UI's (a rare thing these days), to get something off the ground, or to play around with some logic w/o worrying about wiring up a GUI, and so forth.
But mission critical, repetitive, and or operational processes deserve a more robust mechanism of execution / interaction, in my opinion.
From a "Patterns" perspective, more logic and context would be necessary to make a real recommendation, but from a general ETL perspective the Pipeline aka Pipes and Filters pattern is often a key consideration...you may already have a simple version of it set up even if you didn't deliberately set out to implement it as it's very natural to this kind of problem.
If you are ETL'ing a lot of data, you probably want to consider splitting it up into pieces that you can do parallel processing on...but that leads into a bigger discussion.
Hope something in there helps...
You need to look at what the root problem is.
Are you seeking data redundancy?
Are you seeking minimal data access times?
Are you seeking sharing data across separate environments?
Are you seeking to minimize security vulnerabilities with access to the data?
Once you decide what the highest priority is, then you can work on finding the best solution.
For data redundancy and access times the most likely solution is using SQL replication. The goal of replication (which those products are very good at) is data redundancy & minimizing access times through slave database servers. This option allows all points to have access to nearly the same data, you just need to monitor replication lag to ensure it stays within business requirements.
For separate environment concerns, I believe either an ESB or automated batch operation is preferable. This enables additional manipulation to occur before/during the operations to ensure any variances between servers can be resolved at data import time.
To minimize security vulnerabilities I would recommend a non-automated batch operation that has an appropriate level of security checks to ensure data validity. By not automating this operation it allows a human to confirm that there are no outstanding issues that may cause data corruption.
For any decision, big or small, you need to do a cost-benefit analysis before implementing any changes. You need to take into consideration:
- Development time
- Development cost
- Future use of the solution
- Expected performance
- Complexity of the solution
- Maintainability of the solution
Best Answer
Enterprise Service Bus is a "software architecture" model used for designing and implementing communication between mutually interacting software applications in a service-oriented architecture (SOA).
Extract, Transform and Load (ETL) refers to a data warehousing process that extracts data from various data sources, transforms it into the proper format, and loads it into the final target, usually a database.
The two things that they both have in common:
But they have different purposes. ESB is used for ongoing operations, whereas ETL is intended for one-time, ad-hoc or batch-like imports of data, much like the purposes of OLAP and OLTP.