C# – Best way to keep a copy of data optimized for read

csql server

I have a system that handles various transactions from hundreds of clients a minute. Each transaction is limited to a subset of functionality, and thus for each type of transaction (circa 25 of those) I have a separate database table (this avoids lock waits for all the clients connecting). These clients are connecting to several C# web apps running in IIS on Windows.

However, we have a requirement for a read-only access for a subset of the data that is a product of most of these transactions. Currently, this read-only process is a 14 table LEFT OUTER JOIN, which is painfully slow. (We are using Ms SQL server)

The current thinking is to create a separate table to store the data and ship the read data on update, so that read-only operation is as fast as possible.

What would be the most stable and scalable approach?

The way I see it there are three options, none of which I really like:

  1. Db triggers
  2. A background service (daemon) that updates data (periodically or via a trigger)
  3. A background task fired in my web that updates the data.

EDIT: Ideally, the POs want the data to be no more than 3 seconds stale. We are talking just tens of thousands of rows (less than 100K) with about 90 columns.

Best Answer

I will try to give you a different perspective of this problem by separating write and read operations and their respective stores. A big red warning before diving into this in more depth: while you can optimize for performance, you are also introducing complexity, which you should understand, consider, and discuss with your business/operations managers.

Write operation

As you mention your "transactions with a subset of functionality," these are your use cases/models for write operations. This way, you can optimise the write models as your business drives you. Your write models could be still stored in your current RDBMS choice.

The first important distinction and added value here is that you allow yourself to:

  • Separate the write concern (and its persistence store) from reads
  • Easily add cross-cutting concerns like testing, logging, auditing, deadlock management, authorising, scaling, etc.

Your writes are basically commands that originate from your client's requests. These commands contain all the necessary payload for storing your use-case optimised model to the write persistence store.

Read operation

Depending on your data structure and relations, read models are a way to store data in a denormalized way. As you wrote: a current read operation consists of 14 SQL joins, which could be a nightmare to deal with. This is where use-case optimised read models can step in.

Your read models are the queries that serve the denormalised data (maybe from its separate data store). While using separate read models you optimise for performance but also add neccessary complexity.

  • Separate the read concern from writes (and its persistence store)
  • Ease of adding cross cutting concerns like testing, logging, auditing, caching, authorising, scaling, etc.

Read side storage could range from RDMS to fast NoSQL solutions like document DBs and key-value stores.

Synchronisation between the data stores

The way I do it is: whenever a command executes, it publishes an event containing the payload of IDs necessary to start reconstituting the read store (to be in sync with the write side).

To answer the question you asked in a comment, syncing really depends on your actual architecture. I, for one, am really against pushing code out to external sources (e.g. triggers, batch jobs) if possible. I prefer the inverse: push all code as deep as possible into my own implementations (over which I have direct control). As stated, I'm using commands, events, and queries for data manipulation.

A concrete lightweight example:

  1. A request is caught by an IIS-hosted WebApi or MVC controller.
  2. The Controller populates the respective command object with values from the request.
  3. A CommandHandler gets fired up from the Controller.
    • The command handler does its job storing the given write model
    • The command handler raises an event to notify the subscribers about the change
  4. An EventHandler, responsible for handling the event raised by the previous command handler, fires up a new command that's responsible for reconstituting the read store side.

To summarize the points above: you have the flow of syncing in your hand, and do not depend on third-party solutions like triggers, NT services, and so on.

Some things to note:

  • Since we adhere to Command Query Separation, we can easily decorate point 2 with command authorisation, validation, logging, etc. without actually modifying the code in the Controller or the command handler.
  • As you may have noticed, we struggle to keep parts of this subsystem as decoupled and testable as possible. Also, the concerns are blindingly separated.

Your question about "what if the event handler dies?" is gong to make this a bit more complex, by introducing new concepts called Eventual Consistency and message/event queuing. Quoting Greg Young:

On most systems you can just use a queue as the read system will want pretty much every event from the write system anyways.

This concept extends this answer with:

  • a place to store all events that happened on the write side
  • a queue that goes through these stored events and applies them to the read side

Conclusion

All this above is nothing new. I like to call this CQRS Lite - a term I have borrowed from Dino Esposito. He has a very comprehensive tutorial course on this at PluralSight. I wholeheartedly recommend that you watch it there, as there is a lot to learn from him. [1]

[1] It must be noted here that this course is behind a paywall, it means you need to pay money to watch it, and I would like to make it clear that I'm not affiliated with PluralSight in any ways.

Another good source of information on this subject is CQRS Performance Engineering: Read vs Read/Write Models from Derick Bailey.