Java Multi-Threading Performance When CPU is Maxed Out

javamultithreading

I've noticed my software severely degrades when the # of threads is substantially increased.

What I mean is that when I limit the # of threads, the performance is much better than when I just let them all run simultaneously.

My cpu is an i7-3940XM, so very fast for a mobile and still not too shabby compared to desktop i7s for an old processor. It is 4 core but has 8 logical cores. Windows 10.

The test case creates 65 threads and it takes almost 5 minutes to run. CPU is maxed out when this happens because the code is mostly all in-memory and the only resources it accesses somewhat frequently is a ram-disk.

But when I limit the # of threads that can run concurrently, performance drastically improves:

Threads means concurrent Threads in the image below, each time is for the same application that ran 65 total Threads, only the # of concurrent threads varied

enter image description here

So it seems that performance is best when the # of threads is close to the # of logical cores

The reason I'm posting though is I wonder if I need to investigate further if I have anything too "blocking" in my code, I don't really understand why when there is no cap on the # of simultaneous threads it slows down so dramatically.

Can anyone offer some thoughts?

update:

I did find some file write/read code I forgot about, and switched it off – so at 8 simultaneous threads it made no difference in time per thread but at 65 it dropped that down to 1.00 seconds avg per thread

Best Answer

Sounds like you're running into Context Switching issues. (The linked article talks about entire processes rather than threads, but the idea is similar) There is a very real cost incurred when a CPU switches from working on one thing to working on another.

As you've discovered, when the number of CPUs roughly matches the number of threads, the CPUs don't have to put down one bit of work to pick up and work on another one very often.

If you have "too many" threads, then the OS is going to try to make roughly equal progress on all of them at the same time. Since you don't have that many cores, it means that each core will pick up a thread, do a little work, save that work somewhere, pick up the next thread, and repeat. The "picking up" and "saving" adds up.

Threading is useful for keeping a UI alive and it can be very useful for I/O intensive work (where you spend a lot of time waiting for bits to arrive or depart). Once you're past "keep all the cores busy", it's not overly useful for speeding up CPU bound operations.