CQRS, Event Sourcing and (near) Real Time Reporting

cqrsevent-sourcingreporting

I am working with a small team that is developing a CQRS/ES "semi-microservice architecture. We are pretty far along, but running into some interesting challenges with our projections and further challenges are we start to move our projections out into a reporting database to handle cross domain concerns. I realize these are complex problems and there's no one size fits all solution. It's my first time using a heavily event based architecture so please forgive me if I am using some of the wrong terms. Perhaps this is why I am having a hard time finding further information to tackle these challenges. I am not expecting anyone to solve these problems for me, but I would be very grateful for help with terminology and if you could point me to any resources that may be helpful with the problems I will outline below. Thanks in advance!

Alright so my team and I are building software which has several services. Each service uses a CQRS architecture, eventing and some entities are event sourced. Entities that are event sourced are event sourced because there are dependencies on specific versions of those entities. Our domain event architecture is heavily inspired by Vaughn Vernon's "Implementing Domain-Driven Design". Each service has it's own relational database or at least it's own schema which is treated as a separate database. Each database has a domain event table with a constraint on the aggregate and version number to ensure a transaction fails if two events come in at the same time for the same entity. This is a heavily collaborative application so this is very important to us.

Problem #1:

We are currently publishing our domain events to subscribers. The subscribers are currently limited to the service itself and are usually projections. We tried publishing the event after a transaction is processed successfully, so the events could be processed asynchronously without holding up the user, but this led to events being processed out of order. We now process most events inside the transaction. This works for now because our projection handling logic happens very quickly, but this may not be the case for very long. The projections don't have to be updated in real-time, but they do need to be updated in near real time. We can probably allow for delays up to 2 or 3 seconds. How is event order typically guaranteed in this scenario?

Problem #2:

We are beginning to require complex sorting and filtering on views that combine data from several services which seems to necessitate that we move our projection logic out into a separate reporting service. We've looked at a few different models such as push based mechanisms or pull based mechanisms inspired by Kafka, but we're having a hard time determining how we get the events out of each service and then how we can process them in order, by aggregate, in our reporting service (especially if we are running multiple instances of our reporting service). We do recognize that based on our current setup that we can only guarantee order within services and not across services, but this is acceptable as we expect these operations to be commutative in that (the aggregate of 4 events from service 1).aggregatedWith(the aggregate of 5 events from service 2) == (the aggregate of 5 events from service 2).aggregatedWith(the aggregate of 4 events from service 1). The same 2-3 second delay is also acceptable here. Any resources or search terms on this type of problem (or any alternative suggestions) would be much appreciated!

Best Answer

Have a listen to Greg Young's talk on Polyglot data; he may persuade you that you want a pull model, rather than a push model, for your subscriptions.

Essentially, when the subscription "wakes up", it refreshes its local copy of the event history/histories from the book of record, and then writes out the new projection from those histories.

If you store metadata with the projection, you can keep track of where you left off, which can reduce the amount of redundant information you fetch from the book of record (assuming it supports ranged queries).

Fundamentally, the book of record is a database, not a service (the "service" is responsible for publishing changes to the book of record). You get events out of it by sending a query. You probably won't send the queries directly to the book of record; the actual database being used is an implementation detail that you may want to change, and in any case it's probably domain agnostic. Your database (which supports domain specific queries) is a facade in front of the persistence appliance that you've chosen.