How to best design a job queue with constraints

data structuresdesign-patternsqueue

Consider the following situation:

  • You have a program which creates numerous 'jobs' that need to be processed and places them into a queue.
  • You have other worker programs which grab the next 'job' in line so they can process that job.
  • Each job has a category.
  • There could be any number of categories.
  • Two jobs that have the same category cannot be processed at the same time by separate workers.
  • A worker can process one job at a time.

A traditional queue would not work in this situation because there is a chance that multiple jobs of the same category would be processed simultaneously, which is not allowed.

You could have the worker check the job it grabs and see if that jobs category has another worker that is currently processing against it, and if so re-submit the job into the queue to be processed at a later time. This seems like an inefficient way of solving this problem. Are there any data structures or design patterns that can solve this issue?

If you need further clarification, please let me know.

Best Answer

There are two parts to this problem.

One: the unknown list of possible categories.

Two: interprocess communication between the workers to prevent two jobs of the same category being processed simultaiously.

If you had a known list of categories you can have one queue and one worker per category.

With unknown categories, you can still have a queue per category, but having a queue worker per category requires you to monitor all queues and fire up new workers when a new category appears.

This can be achieved with a 'master' worker which hands out jobs

All jobs go in the 'master' queue.

category worker creates a private queue and registers with master as available for work.

master worker picks up jobs, checks category, checks for available workers and assigns the job to one of them by putting it in thier private queue.

The master can keep track of the category assigned to the worker.

master picks up next job, its the same category and the worker is still busy so it puts the jobs in a category specific holding queue

master gets next job, its a new category so it assigns it to another category worker.

Category worker completes job and reregisters for work

master checks holding queues and master queue next job and assigns to available category workers.

If a category worker crashes during a job it wont reregister. So the master can have some timeout logic where it will give up and start assigning the categories to another worker.

You also have to be carefull to only have a single master worker at any given time. This nescitates an exclusive lock on the master queue of some kind

Related Topic