Java – How to determine optimal number of threads for high latency network requests

akkajavamultithreadingnetworking

I am writing a utility that must make thousands of network requests. Each request receives only a single, small packet in response (similar to ping), but may take upwards of several seconds to complete. Processing each response completes in one (simple) line of code.

The net effect of this is that the computer is not IO-bound, file-system-bound, or CPU-bound, it is only bound by the latency of the responses.

This is similar to, but not the same as There is a way to determine the ideal number of threads? and Java best way to determine the optimal number of threads [duplicate]… the primary difference is that I am only bound by latency.

I am using an ExecutorService object to run the threads and a Queue<Future<Integer>> to track threads that need to have results retrieved:

ExecutorService executorService = Executors.newFixedThreadPool(threadPoolSize);
Queue<Future<Integer>> futures = new LinkedList<Future<Integer>>();

for (int quad3 = 0 ; quad3 < 256 ; ++quad3) {
    for (int quad4 = 0 ; quad4 < 256 ; ++quad4) {
        byte[] quads = { quad1, quad2, (byte)quad3, (byte)quad4 };
        futures.add(executorService.submit(new RetrieverCallable(quads)));
    }
}

… I then dequeue all the elements in the queue and put the results in the required data structure:

int[] result = int[65536]
while(!futures.isEmpty()) {
    try {
        results[i] = futures.remove().get();
    } catch (Exception e) {
        addresses[i] = -1;
    }
}

My first question is: Is this a reasonable way to track all the threads? If thread X takes a while to complete, many other threads might finish before X does. Will the thread pool exhaust itself waiting for open slots, or will the ExecutorService object manage the pool in such a way that threads that have completed but not yet been processed be moved out of available slots so that other threads my begin?

My second question is what guidelines can I use for finding the optimal number of threads to make these calls? I don't even know order-of-magnitude guidance here. I know it works pretty well with 256 threads, but seems to take roughly the same overall time with 1024 threads. CPU utilization is hovering around 5%, so that doesn't appear to be an issue. With that large a number of threads, what are all the metrics I should be looking at to compare different numbers? Obviously overall time to process the batch, average time per thread… what else? Is memory an issue here?

Best Answer

It will shock you, but you do not need any threads for I/O (quantitatively, this means 0 threads). It is good that you have studied that multithreading does not multiply your network bandwidth. Now, it is time to know that threads do computation. They are not doing the (high-latency) communication. The communication is performed by a network adapter, which is another process, running really in parallel with with CPU. It is stupid to allocate a thread (see which resources allocated are listed by this gentlemen who claims that you need 1 thread) just to sleep until network adapter finishes its job. You need no threads for I/O = you need 0 threads.

It makes sense to allocate the threads for computation to make in parallel with I/O request(s). The amount of threads will depend on the computation-to-communication ratio and limited by the number of cores in your CPU.

Sorry, I had to say that despite you have certainly implied the commitment to blocking I/O, so many people do not understand this basic thing. Take the advise, use asynchronous I/O and you'll see that the issue does not exist.