Design Patterns – Determining the Scope of a Job When Designing a Job Queue

designdesign-patternsqueue

We've got a job queue system that'll cheerfully process any kind of job given to it. We intend to use it to process jobs that each contain 2 tasks:

  • Job (Pass information from one server to another)
    • Fetch task (get the data, slowly)
    • Send task (send the data, comparatively quickly)

The difficulty we're having is that we don't know whether to break the tasks into separate jobs, or process the job in one go.

Are there any best practices or useful references on this subject? Is there some obvious benefit to a method that we're missing?

So far we can see these benefits for each method:

Split

  • Job lease length reflects job length: Rather than total of two
  • Finer granularity on recovery: If we lose outgoing connectivity we can tell them all to retry
  • The starting state of the second task is saved to job history: Helps with debugging (although similar logging could be added in single task method)

Single

  • Single job to be scheduled: Less processing overhead
  • Data not stale on recovery: If the outgoing downtime is quite long, the pending Send jobs could be outdated

Best Answer

Which of these represents the minimum useful addition to the work that your application does? Usually I take the view that a job on a queue should represent a useful work unit: whether it completes or cancels, you should end up with the system in a consistent state.

That situation is mostly defined by your problem domain, so it's not something for which a general answer exists. Sometimes there are architectural limitations that force you to split up work in unnatural ways. An example is in a GUI application, where you probably aim to do all of your application's work concurrently but then update the user interface on a dedicated thread. That means you have to split your work ("do something useful and show the user I did it") into those two steps ("do something useful, and show the user I did it"). In fact in this case it's not too much of a problem, because if the app quits before updating the UI it's likely that the user didn't want to know about the work you'd done anyway.

If the "minimum useful addition" is too small, then I think about batching them to reduce the amount of time spent in job-submission overhead. This definition of "too small" is something that requires measurement for your work and in your environment - it depends more on the architecture than on your problem. Profile your application: if you're spending a significant amount of time adding and removing things from queues or creating and destroying threads, you're doing too little work in each operation.

Related Topic