SOA and shared databases

databasesharedsoa

I don't understand SOA (Service-oriented Architecture) and databases. While I'm attracted by the SOA concept (encapsulating reusable business logic into services) I can't figure out how it's supposed to work if data tables encapsulated in a service are required by other services/systems—or is SOA suitable at all in this scenario?

To be more concrete, suppose I have two services:

  • CustomerService: contains my Customers database table and associated business logic.
  • OrderService: contains my Orders table and logic.

Now what if I need to JOIN the Customers and Orders tables with an SQL statement? If the tables contain millions of entries, unacceptable performance would result if I have to send the data over the network using SOAP/XML. And how to perform the JOIN?

Doing a little research, I have found some proposed solutions:

  • Use replication to make a local copy of the required data where needed. But then there's no encapsulation and then what's the point of using SOA? This is discussed on StackOverflow but there's no clear consensus.
  • Set up a Master Data Service which encapsulates all database data. I guess it would get monster sized (with essentially one API call for each stored procedure) and require updates all the time. To me this seems related to the enterprise data bus concept.

If you have any input on this, please let me know.

Best Answer

One of the defining principals of a "service" in this context is that it owns, absolutely, that data in the area it is responsible for, as well as operations on that data.

Copying data, through replication or any other mechanism, ditches that responsibility. Either you replicate the business rules, too, or you will eventually wind up in a situation where you wind up needing the other service updated to change your internal rules.

Using a single data service is just "don't do SOA"; if you have one single place that manages all data, you don't have independent services, you just have one service.

I would suggest, instead, the third option: use composition to put that data together, avoiding the database level JOIN operation entirely.

Instead of thinking about needing to join those two values together in the database, think about how to compose them together at the edges:

When you render an HTML page for a customer, you can supply HTML from multiple services and compose them next to each other visually: the customer details come from the customer service, and the order details from the order service.

Likewise an invoice email: compose data supplied from multiple services visually, without needing the in-database join.

This has two advantages: one, you do away with the need to join in the database, and even the need to have the data stored in the same type of database. Now each service can use whatever data store is most appropriate for their need.

Two, you can more easily change the outside of your application. If you have small, composable parts you can easily add rearrange the parts in new ways.

Related Topic