Data Serialization – Processing Business Logic Efficiently

abstractiondata structuresdesign-patterns

Going by the general principle of data abstraction, I normally abstract data in a serialized format(JSON) and pass it as a parameter to the Business Logic(BL) modules such that the BL module always see a consistent format of the data irrespective of the underlying data storage layer. Even if I use a ORM, I use the serialized format of that ORM. I feel I have the following advantages.

  • By serializing data, there would be better control over data and parameters
  • Any developer can write BL without thinking too much about the underlying data storage
  • Wrappers can be written for new databases (There are a lot of ready-made wrappers to convert data into JSON)
  • Testing could be done ruthlessly and without a database (since the data format is serialized)
  • Functionally easier to understand and better OLAP, OLTP integration

I also notice the following drawbacks

  • A new layer to be added between the database and the BL module (most ORM's nowadays come with a predetermined JSON format)
  • Extra function calls slow down the application (but I think this trade-off is OK when compared to better testing and easy maintenance)
  • Increased level of abstractions may confuse the developer

I make the above observations mostly in the context of business applications
To follow this discussion, lets have an example so that things could be discussed in its context.
Assume a product database having the columns: Product,Rate,Taxes and we need to create a invoice

Code without database abstraction

GET rate,taxes for product X from database
multiply qty with rate and add taxes
display invoice

Code with database abstraction

  GET rate,taxes for product X from database
  Convert it into JSON
  call create_invoice function //This does all the calculations
  display invoice

In the second example, I would pass arguments in the form (Product=X, Qty=5, Rate=5, Taxes=0.05). If taxes are to be split into more that one category (State Tax=0.03, Central Tax=0.02) or a discount factor to be added I would just increase the number of parameters in the BL functions such that the database fields, the JSON keys and the function parameters match (this is done automatically during serialization and most ORM's do it). This makes it easy, in my approach, to extend functions and also make modules independent of data since the modules always know the data they can receive and even if they receive a new parameter, they can adapt it provide the underlying code is intelligent.

My general questions are

  1. Is this a good pattern and what's it called (Data abstraction comes to my mind)?
  2. Pros/Cons of this pattern(apart from those mentioned above) in the context of business application, apps for embedded devices, big data
  3. Is there a difference between this pattern and ORM (I believe so since ORM is mostly a class wrapper to get data from a database while this pattern is more oriented towards data structure)
  4. If this is good, can this be easily understood by a new developer?

Best Answer

A lot depends on your intention. Data serialization in this manner just to pass on to the business logic of a single application, seems wasteful, when you should be passing a native object from the ORM to the object encapsulating BL which will modify state and return the object to the ORM for persistence.

On the other hand, if you have multiple, distributed applications that handle different aspects of of the domain, then wrapping your DB in an API to provide serialized (JSON or XML) data is a good idea.

For instance: I have to deal with a rather insane vendor-supplied legacy database in which a major constraint is inability to modify the DB schema other than adding the occasional view. I have this DB (as well as a couple of our other 'enterprise' data stores) wrapped in a REST API. Most of our user-facing applications, as well as daemons that monitor the DB for certain events, communicate with the API. In this way I can have the following workflow:

  1. OR/M classes each wrap a single table in the DB. One object == one record.
  2. Decorator classes handle presentation, including composition of more complex objects from simple model objects.
  3. Controller classes respond to requests with a JSON representation of the object appropriate to the client application.
  4. Clent processes data, POSTs or PUTs JSON object back to the API
  5. Controller requests to save object, which is decomposed back to model objects and persisted by the O/RM.

This adds significant complexity.

If you have to deal with heterogeneous data stores, distributed applications, etc., this is an excellent solution. But unless you have those requirements, the rule of thumb is You Ain't Gonna Need It.

Related Topic