I've been puzzling over a good implementation for this for a while. I have a program that does a long-running I/O operation (downloading a file) and a long-running CPU operation (parsing its contents). To improve efficiency, I wanted to have one thread pool perform the I/O section and have it pass tasks to another thread pool that performs the CPU section. That way, it's unlikely that all threads will be stuck either hogging the I/O or hogging the CPU. Depending on the file, one operation will likely take longer than the other.
What's the best way to perform a hand-off between thread pools? Assuming I'm using Java with two ExecutorServices, like so:
int threads = Runtime.getRuntime().availableProcessors();
ExecutorService ioService = Executors.newFixedThreadPool(threads);
ExecutorService cpuService = Executors.newFixedThreadPool(threads);
public BigFile ioTask(){
return Connection.downloadBigFile();
}
public void cpuTask(BigFile bigFile){
processBigFile(bigFile);
}
What's the best way to have a task running under ioService
to add a task to cpuService
while passing data into it?
What I've considered:
- Submit a
Runnable
tocpuService
that executes aCallable
inioService
. BlockscpuService
while running theCallable
. - Submit a
Runnable
toioService
that executes aRunnable
incpuService
. BlocksioService
while running the secondRunnable
. - Create a
Runnable
implementation that has anIOTaskResult
constructor, allowingioService
runnables to submit these implementations to thecpuService
. Turns a reusable processing object into a consumable process object, which adds more overhead. - Add a method to generate a
Runnable
to the CPU task processor. This feels like a workaround and kind of "hack-y", considering I have a method I can call and theRunnable
just wraps the method and fills in the parameters for me.
Is there a good way of handling this? I'd love to hear some thoughts.
Best Answer
You don't actually need two thread pools to pull this off, and doing it that way can be counterproductive from execution and complexity standpoints.
First, threads are cheap, and you shouldn't hesitate to use one per download-and-process job if you're not going to be running hundreds of them. The I/O-bound portion of the process is going to spend a lot of its time asleep, and you really want to be able to grab a few cycles of CPU when the next frame arrives and then go back to sleep. I'm going to hand-wave the slowdown that happens if you have lots of files being downloaded simultaneously and will come back to it later.
Based on your current design, it looks like that in this case, your definition of efficiency means that files downloaded first get processed first.* This means you want to put a core on each processing job and keep it there until it's finished. You can regulate this using a semaphore that counts the number of cores in the system:
This will allow the first
num_cores
threads that want to use a core to acquire one, and any others will have to wait until one is released. With a way to regulate the number of threads using cores, the entire process can be cut back to a single routine:The number of simultaneous downloads can be regulated with a second semaphore that's acquired and released around the first statement.
*If that isn't the case, there's no efficiency to be gained by limiting the number of threads as long as you keep the number running down to the number of cores. If you have n CPU cycles of work to do and c cores, it's going to take n/c cycles' worth of time total no matter how you slice it up.