Designing an scalable message queue architecture

designmessage-queuescalability

I have recently started learning the nuances of scalable and enterprise computer architecture, and one of the central components is a messaging queue. In order to learn the most I can from any programming paradigm, I am trying to implement my own version of a messaging queue service.

So far, my initial design runs on a threaded socket listener, but in order to prevent the same message being downloaded twice by two separate processing nodes, the message queue index register is locked when a read is initiated, and unlocked after the register has been updated. As such, this negates the need for it to be threaded, and means that there is a ceiling for the size of a scalable system based on the processing speed of the server the messaging queue service is running on.

The way to get around this would be run the message queue service on multiple servers, but this will increase the likelihood of the same message being downloaded twice. The only way to prevent such issues occurring would be to include a revocation callback that (after the servers, or even the threads on a single server, have synchronized their information and detected such a re-issuance) would command the processing node to stop its current job, and re-query the message queue for the next message, but again, there would be a ceiling where most of the traffic being sent would be synchronizations, and revocation callbacks, causing a bottleneck and slowing the processing of information so that a lot of the processing nodes would be performing null operations and wasting time.

The last way I can think of to get around this problem is to have each message queue server (and each thread on each server) would have a specific offset as to where in the queue it is looking, but that might have issues based upon the type of application, especially if the processing is required to be done in a specific order.

So, all that being said, are there any designs of message queue architecture's that could show me how existing enterprise grade message queue services avoid these problems?

Best Answer

In short:

This is a hard problem. Don't reinvent the wheel.

Many technologies solve the message queue layer. They include

I think it's out of scope for me to discuss the drawbacks of each, not the least because I don't claim the expertise to do this well cough don't use Rabbit cough.

Even if you don't want to use any of these technologies, read their documentation.

This will educate you on design patterns that are possible over one system. Reading ZeroMQ's documentation will educate you on many classic message queuing architectures they have graciously implemented. Even if you do not use ZeroMQ, knowing these patterns will help you evaluate other queuing technologies by asking if you can implement that pattern there.

Learn about RabbitMQ/AMQP's exchange-queue model. Routing may come up for you - this is supported by Redis PUBSUB but I don't recall being supported by ZeroMQ - and fanouts are something my shop has been using, albeit poorly implemented over a Memcached poll (yuck!), for quite some time.

How to choose one?

I work at a startup whose SLA is typical for a web-app - some outages are okay, as long as we can quickly restore service with little data loss. We haven't had to think about scaling problems like Twitter or Tumblr has, so we haven't had to think about throughput volume. That being said, if you are implementing an SLA similar to mine, these considerations will come to mind:

  • do the client libraries work? Is it easy to maintain a connection in them? (ZeroMQ, Redis: yes. RabbitMQ: no).
  • is monitoring and management easy from a server console? (Redis: yes, RabbitMQ: yes, ZeroMQ: not that I recall but we did not use it that long)
  • do clients support internal queues so little data loss occurs in short outages? (ZeroMQ, Redis: yes. RabbitMQ: no.)

Of course, if you are working for, say, a high-frequency trading shop, these will be your lesser concerns. You'll be more willing to put development time into a client-side library in exchange for higher throughput in the end. But I'm writing this more to warn you that these technologies tend to market based on their performance, not their out-of-the-box functionality. If you're a web-startup, you are far more interested in the latter than the former, and accordingly, something like Redis, which is more optimized for ease of use at good performance than the difficulty of use at great performance, is probably a better choice than RabbitMQ. (I don't like RabbitMQ).

Related Topic