First, it is important to understand and be able to leverage the difference between Commands and Events.
As this question succinctly points out, Commands are things we would like to happen, and Events are things that have already happened. A command does not necessarily result in a significant event in the system, but it usually does. For example, a send message
command may be rejected, in which case no event happens (typically an error would not be considered an event in this sense, though we may still choose to log it in a diagnostic log). Now, if the send message
command is accepted, the message sent
event occurs, and event details could describe the sender, the receiver, and the content.
When we talk about the system state, we are actually discussing not a culmination of commands, but of events. Only events reflect changes of state in the system. To draw from a life example, suppose I go to the local Publix supermarket and buy a Florida lottery ticket. The command was "Buy Ticket" and the event was "Ticket issued." My next command then is to the lottery to draw my numbers for the PowerBall. The lottery is going to ignore my command (but I have no knowledge), and the event "PowerBall numbers chosen" takes place irrespective of my wishes. If my numbers match, the event "Jackpot won" happens to me (and I think my command was heard). If not, I realize my command was ignored.
From a historical perspective, the lottery is only interested in a subset of events. The lottery only cares that (a) a ticket was issued, (b) the numbers were chosen, and (c) the jackpot was won. Those are the items of interest. The act of purchasing the ticket, wanting to win, etc. are all irrelevant, as is what I do with my ticket after I lose. While the real world does change for mundane events, we only need to record those events which are significant to our system.
In theory, under an event-sourcing technique, a stream of events may be replayed from the beginning of time to arrive at the current state. This relies upon the assumption that the underlying system conditions are constant and deterministic. However, these assumptions are not valid in many systems. The data associated with an event, as well as the types of events we are interested in, may change as our computer software evolves. In addition, it can be computationally expensive to re-compute the current state in response to every query. For this reason, snapshots of the system state are often taken to represent known points in time, which most recent events can then be added to.
While it is still possible to replay an event stream across multiple versions, the amount of human effort involved in doing so is likely to be cost-prohibitive. Unless there is a justifiable reason to design that capability into the system, you are better off building your system to utilize snapshots.
Example in Question
In the example given in the question, the architecture is not truly event-based; it is command-based. Replaying commands creates the system state. This is an anti-pattern and should be fixed. Instead, the primary events are:
- Buyer asks question
- Seller responds to question
Each of these events can be "replayed" to give the current state. For example, in the act of asking a question, the system behavior might be to email the seller and increment the unanswered question
counter. This behavior can be changed; however, the fact that the question was asked does not. Similarly, the system might decrement the unanswered question
counter when the seller responds. This behavior is changable, but the fact that the seller responded is not.
Most event-sourcing systems would dynamically compute the count of unanswered questions by replaying the specific event stream in response to a query.
How do I deal with side effects in Event Sourcing?
Short version: the domain model doesn't perform side effects. It tracks them. Side effects are performed using a port that connects to the boundary; when the email is sent, you send the acknowledgement back to the domain model.
This means that the email is sent outside of the transaction that updates the
event stream.
Precisely where, outside, is a matter of taste.
So conceptually, you have a stream of events like
EmailPrepared(id:123)
EmailPrepared(id:456)
EmailPrepared(id:789)
EmailDelivered(id:456)
EmailDelivered(id:789)
And from this stream you can create a fold
{
deliveredMail : [ 456, 789 ],
undeliveredMail : [123]
}
The fold tells you which emails haven't been acknowledged, so you send them again:
undeliveredMail.each ( mail -> {
send(mail);
dispatch( new EmailDelivered.from(mail) );
}
Effectively, this is a two phase commit: you are modifying SMTP in the real world, and then you are updating the model.
The pattern above gives you an at-least-once delivery model. If you want at-most-once, you can turn it around
undeliveredMail.each ( mail -> {
commit( new EmailDelivered.from(mail) );
send(mail);
}
There's a transaction barrier between making EmailPrepared durable and actually sending the email. There's also a transaction barrier between sending the email and making EmailDelivered durable.
Udi Dahan's Reliable Messaging with Distributed Transactions may be a good starting point.
Best Answer
Both methods are totally fine, and as usual, the answer is: "it depends".
The 2nd method you suggested is easy to implement and is used quite often - in particular, I'd consider it to be the standard method to bring new event handlers/projections online into an existing system, since new consumers need to be replayed the full event history at least once.
Regarding other consumers, please not that
Furthermore, have you actually measured the time that it takes to replay all events into a single projection (the corrupted one)? Usually you can easily handle tens of thousands of events and read model updates per second (use one big transaction), so replaying all events to repair a single read model should be a matter of minutes. We actually replay all events on every system startup, since our read model is only stored in memory, and it's blazing fast.
If your event store/messaging infrastructure does not support query by event type, the 1st method you suggested is a bit harder to implement, since you need to implement the query interface. This might be extremely hard or it might be trivial, depending on how your event store is designed. So if you don't want to use the second method you suggested, implement the 1st method, use selective queries to repair a read model and call it a day.