In short:
This is a hard problem. Don't reinvent the wheel.
Many technologies solve the message queue layer. They include
I think it's out of scope for me to discuss the drawbacks of each, not the least because I don't claim the expertise to do this well cough don't use Rabbit cough.
Even if you don't want to use any of these technologies, read their documentation.
This will educate you on design patterns that are possible over one system. Reading ZeroMQ's documentation will educate you on many classic message queuing architectures they have graciously implemented. Even if you do not use ZeroMQ, knowing these patterns will help you evaluate other queuing technologies by asking if you can implement that pattern there.
Learn about RabbitMQ/AMQP's exchange-queue model. Routing may come up for you - this is supported by Redis PUBSUB but I don't recall being supported by ZeroMQ - and fanouts are something my shop has been using, albeit poorly implemented over a Memcached poll (yuck!), for quite some time.
How to choose one?
I work at a startup whose SLA is typical for a web-app - some outages are okay, as long as we can quickly restore service with little data loss. We haven't had to think about scaling problems like Twitter or Tumblr has, so we haven't had to think about throughput volume. That being said, if you are implementing an SLA similar to mine, these considerations will come to mind:
- do the client libraries work? Is it easy to maintain a connection in them? (ZeroMQ, Redis: yes. RabbitMQ: no).
- is monitoring and management easy from a server console? (Redis: yes, RabbitMQ: yes, ZeroMQ: not that I recall but we did not use it that long)
- do clients support internal queues so little data loss occurs in short outages? (ZeroMQ, Redis: yes. RabbitMQ: no.)
Of course, if you are working for, say, a high-frequency trading shop, these will be your lesser concerns. You'll be more willing to put development time into a client-side library in exchange for higher throughput in the end. But I'm writing this more to warn you that these technologies tend to market based on their performance, not their out-of-the-box functionality. If you're a web-startup, you are far more interested in the latter than the former, and accordingly, something like Redis, which is more optimized for ease of use at good performance than the difficulty of use at great performance, is probably a better choice than RabbitMQ. (I don't like RabbitMQ).
Events aren't about what changed. They're about when something changed.
I can create an event system completely decoupled from the contents that changed. That way all I learn from an event is that an object has been updated. If I even care that the object has been updated I'll then tell whatever knows how to talk to that object to go ask it what changed.
That doesn't solve the problem of communicating these changes. It just stops it from becoming part of the event system.
An example of one way to solve the problem of differing versions of data is to have the observer create and hand the observed object a collection. The observed object populates the collection with it's latest data and when control returns you (the observer) have what you need. If there is extra that you don't care about, because you never heard of it, you simply ignore it.
Many other ways to skin that cat but that's one I've made work in exactly this case.
Best Answer
The way that you effectively load balance a Consumer in Kafka is that you register each of your client nodes/processes into a single Consumer Group.
[Source] https://kafka.apache.org/intro
This is rather basic Client configuration. There is no concept of a "Message Queue" in Kafka, however you can reliably handle transaction based message driven events in this way.