Java – Designing a Queuing Solution with Clustering and Multiple Consumers

clusterdesigndesign-patternsjavamessage-queue

It is a Design Problem which I am listing out here.

I have different set of business operations that are carried out for different business entities.

Operations:

  • Operation A
  • Operation B
  • Operation C

For Example I have an Entity A. Entity A's data could be in parts, for example:

  • Entity A's (Jan Data)
  • Entity B's (Feb Data) etc.

To complete a Use Case all Operations (A,B,C) should be performed. Now these operations are performed and are independent of each other and can be performed in parallel, the only condition is that they should be of different Entities. So Entity A can't have all the operations (A, B or C) executing in parallel. And these operations are running on server side.

How to scale this and provide a solution?

I am thinking of following solution and would like to have inputs from the community on this.

I am thinking of three Queues for operations which I mentioned above

  • Queue A carrying out Operation A
  • Queue B carrying out Operation B
  • Queue C carrying out Operation C

And all the Consumers will be listening to these Queues.

  • Consumer A (or multiple consumers)
  • Consumer B (or multiple consumers)
  • Consumer C (or multiple consumers)

And my server would be load balanced, and I will be having a Single Message Queue containing these three queues.

So it is possible that I have 2 servers running and on every server there are for example 5 threads (Consumers) running, so there will be 10 instances of Consumer A running in parallel picking the data from the Message Queue A.

As I stated earlier that for the same entity A(that is the business use case which I have) all these operations (Operation A, Operation B and Operation C) can't be running in parallel, they should be only of them being executed.

So what I am thinking is to have a database entry for the Entity A and all the consumers must check whether there is a Database Entry for Entity A,

  • if not then

    1. Make an entry to Database for Entity A
    2. Go and execute the Operation
    3. Remove the Entry from the Database for Entity A
  • if there is an entry in Database found

    1. Enqueue the Data for Entity A again from the Queue where it was picked.

Is there any better solution possible for such a design problem?

Best Answer

You can partition your data easily with consistent hashing, in this case you would use the entity as the hash key. The consistent hash takes a key and # of "buckets" as input and gives you back the bucket for that key.

With multiple servers in mind, a simple solution would be to pick a number of partitions up front (lets say 6), which means you will have 6 queues. When producing messages, calculate the bucket [consistentHash(entity.id, 6)] and put the message on the queue (partition) that corresponds to that bucket. This gives you ordering of messages per entity.

On the consumer side, simply ensure you have exactly one consumer per queue (partition). You can have as many servers as you want as long as there are no competing consumers.

You can then take this one step further inside each server to improve parallelism. The consumer of each queue (partition) can be a router, which simply takes each message and does another consistent hash on the entity, to N buckets where N is the number of your worker threads you want for parallelism, then hand off the message to the thread for the calculated bucket.

This setup routes messages for the same entity consistently to the same server, and the server will consistently route the message to the same worker thread. Assuming the N worker threads is relatively high, you get excellent parallelization of your tasks with message ordering based on any key you want.

Related Topic