CQRS, Event Sourcing and (near) Real Time Reporting

cqrsevent-sourcingreporting

I am working with a small team that is developing a CQRS/ES "semi-microservice architecture. We are pretty far along, but running into some interesting challenges with our projections and further challenges are we start to move our projections out into a reporting database to handle cross domain concerns. I realize these are complex problems and there's no one size fits all solution. It's my first time using a heavily event based architecture so please forgive me if I am using some of the wrong terms. Perhaps this is why I am having a hard time finding further information to tackle these challenges. I am not expecting anyone to solve these problems for me, but I would be very grateful for help with terminology and if you could point me to any resources that may be helpful with the problems I will outline below. Thanks in advance!

Alright so my team and I are building software which has several services. Each service uses a CQRS architecture, eventing and some entities are event sourced. Entities that are event sourced are event sourced because there are dependencies on specific versions of those entities. Our domain event architecture is heavily inspired by Vaughn Vernon's "Implementing Domain-Driven Design". Each service has it's own relational database or at least it's own schema which is treated as a separate database. Each database has a domain event table with a constraint on the aggregate and version number to ensure a transaction fails if two events come in at the same time for the same entity. This is a heavily collaborative application so this is very important to us.

Problem #1:

We are currently publishing our domain events to subscribers. The subscribers are currently limited to the service itself and are usually projections. We tried publishing the event after a transaction is processed successfully, so the events could be processed asynchronously without holding up the user, but this led to events being processed out of order. We now process most events inside the transaction. This works for now because our projection handling logic happens very quickly, but this may not be the case for very long. The projections don't have to be updated in real-time, but they do need to be updated in near real time. We can probably allow for delays up to 2 or 3 seconds. How is event order typically guaranteed in this scenario?

Problem #2:

We are beginning to require complex sorting and filtering on views that combine data from several services which seems to necessitate that we move our projection logic out into a separate reporting service. We've looked at a few different models such as push based mechanisms or pull based mechanisms inspired by Kafka, but we're having a hard time determining how we get the events out of each service and then how we can process them in order, by aggregate, in our reporting service (especially if we are running multiple instances of our reporting service). We do recognize that based on our current setup that we can only guarantee order within services and not across services, but this is acceptable as we expect these operations to be commutative in that (the aggregate of 4 events from service 1).aggregatedWith(the aggregate of 5 events from service 2) == (the aggregate of 5 events from service 2).aggregatedWith(the aggregate of 4 events from service 1). The same 2-3 second delay is also acceptable here. Any resources or search terms on this type of problem (or any alternative suggestions) would be much appreciated!

Best Answer

Have a listen to Greg Young's talk on Polyglot data; he may persuade you that you want a pull model, rather than a push model, for your subscriptions.

Essentially, when the subscription "wakes up", it refreshes its local copy of the event history/histories from the book of record, and then writes out the new projection from those histories.

If you store metadata with the projection, you can keep track of where you left off, which can reduce the amount of redundant information you fetch from the book of record (assuming it supports ranged queries).

Fundamentally, the book of record is a database, not a service (the "service" is responsible for publishing changes to the book of record). You get events out of it by sending a query. You probably won't send the queries directly to the book of record; the actual database being used is an implementation detail that you may want to change, and in any case it's probably domain agnostic. Your database (which supports domain specific queries) is a facade in front of the persistence appliance that you've chosen.

Related Solutions

C# – Event sourcing: merging aggregate root and projection

Since both the aggregate and the projection are being "rebuilt" from stream of events, why shouldn't I merge them into single code piece?

Because they have different responsibilities.

The aggregate governs how the stream of events can be extended; at its core is the current collection of business rules governing the changes to this stream.

But the projections are views that support interesting query use cases. They have no interest in the business rules that govern what futures are possible, but only in figuring out how to represent the past.

They are sourced the same way -- and you may want to use common components for extracting the history from the book of record -- but the data structures used to support the use cases are likely to be different.

For example, if you have a use case where you need to query the relationships among the entities in your domain, you're probably going to want to service those queries from a graph database, rather than from an event store. Horses for courses.

Event sourcing, one event, state of two aggregates changed

When transferring it's different - two aggregates must be modified by one MoneyTransferred event.

Transferring money is a separate act from updating the ledgers.

MoneyTransferred
AccountCredited
AccountDebited

The exercise that finally broke this loose for me was realizing that AccountOverdrawn is an event, it describes the state of the account without regard to the other participants in this exchange, so there must be a command run against an account that produces it.

You can't reasonably derive state like AccountOverdrawn from the read model, because you can't possibly know if you have seen all of the events yet -- only the aggregate itself has a full view of the history at any given moment.

The answer, of course, is right there in the ubiquitous language -- accounts are credited or debited to reflect the bank's obligations to its customers.

Allright, but it means I should use AccountCredited and AccountDebited events for deposits and withdrawals as well, so I only register not the cause of change, but the change caused by some other action. If I would like to reverse the action I couldn't, because not all of the events are registered.

I'm not entirely certain that follows, because you do have (for cases like this one) a natural correlation identifier, which is the transaction id itself.

Second thing - it means i need to use something like saga.

Slightly different spelling: you need something like a human being dispatching the right commands.

There are at least two ways you could do it. One would be to have a subscriber listening for MoneyTransferred, and dispatching the two commands to the ledgers.

Another alternative would be to track the processing of the transaction as a separate aggregate -- think of it as a checklist of all the things that need to get done since a transaction occurred. So a MoneyTransferred event handler dispatches ProcessTransaction, which schedules work to be done and checks off what work has been completed.

Best Answer

Related Solutions

C# – Event sourcing: merging aggregate root and projection

Event sourcing, one event, state of two aggregates changed

Related Topic