How many CPUs should be utilised with Hyperthreading

hyperthreading

Let's say I have a server cpu with 18 cores, with hyperthreading on, which means I can see 36 cpus in htop.

To fully utilise the CPU and not impact single-thread performance, should I be aiming for all 36 "cores" to run at 100%, and the HT cores will just do less work and still report 100%, or would having that mean the "full" cores are already getting interrupted by the task on its "HT core" and thus doing less single-threaded work?

I'm aware that there are a lot of variables that affect HT performance, I just want to know what cpu meters mean when dealing with HT.

Best Answer

If the second virtual core is allowed to contribute when the first would otherwise be stuck, it's better than not, so you get (at least) a little extra work done.

The question becomes: when does having two different threads cause one to run worse? The branch prediction and dependencies between instructions won't change. Waiting on memory access now... the two threads compete over memory access, both in cache utilization and bandwidth.

If you have some CPUs running with HT and others not, does that also mean you will assign specific threads to one type or the other? I think not: your programs will run their threads on random virtual cores. So how does splitting the configuration help? Since each CPU has its own cache, the only affect is due to memory bandwidth and the burden of cache coherancy.

In general, you reach a point where having something more you could be doing is more expensive than letting some CPU execution units go idle. This does not depend on the number of threads directly, but on what the threads are doing, and the detailed memory architecture and performance nuances of the various components.

There is no simple answer. Even with a specific program in mind, the machine may differ from those of people relating their own experiences.

You have to try it yourself and measure what is fastest, with that specific work on that exact machine. And even then, it may change with software updates and shifting usage over time.

Take a look at volume 3 of Anger's magnum opus. If you look carefully at some specific processor, you can find limiting resources among the deep pipeline of many steps needed to execute code. You need to find a case where over-comittment causes it to execute slower, as opposed to not taking in more work. In general that would mean some kind of caching; and where the resource is shared among threads.


What does the CPU meter mean: it reports all time that's not spent running the idle thread. Both logical threads assigned to a core will not be idle even though the actual work done on one of them may be small. Time spent with the pipeline stuck for a few cycles until results are ready, memory is fetched, atomic operations are fenced in, etc. likewise don't cause the thread to be shelved as "not ready" so it won't be idle, and time still shows as in-use. Waiting on RAM will not show as idle. Only something like I/O will make the thread block and stop charging time towards it. An operating-system mutex in general will do so, but with the rise of multicore systems that's no longer a sure thing, as a "spinlock" will not make the thread go back on the shelf.

So, a CPU meter of 100% doesn't mean all is smooth sailing, if the CPU is often stuck waiting for memory. A fewer number of logical cores showing 90% could very well be getting more work done, as it finishes the number crunching and is now waiting on the disk.

So don't worry about the CPU meter. Look at actual progress made, only.