Java – Hyper-Threading & htop/system-monitor

cpu-usagehyperthreadingjavasystem-monitoring

I'm running a large set of simulations on a quad-core Xenon E5520 with Hyper-Threading enabled. My software automatically detects 8 (virtual) cores and launches 8 simulations to run in parallel. However htop and system-monitor only show each of the 8 cores as loaded to ~50%.

Is this intended behavior? In a way, it makes sense since the total load would be 400% or 100% for each physical core, but shouldn't I get a bit more than that? I mean that's the purpose of HT right? Use SMT to use the otherwise unused execution units to run another thread. So throughput should be higher right?

I should mention that the load is very consistent, 50% on each core, all the time. The simulations are ran by Java, in a single JVM, the GC is not the issue, I'm way below the JVM heap limit. The simulations are not bound by memory, there is plenty to go around and no swapping. The simulations are writing a lot of data to disk but there are large buffers in place (128MB write buffer for each thread) and the disk activity as shown by gkrellm is frequent bursts of ~90MB/s but it's not a consistent load and I can't believe this could be a bottleneck.

Could any one shed some light on this?

Best Answer

However htop and system-monitor only show each of the 8 cores as loaded to ~50%.

Ok, that simply means you are not running enough simulations at the same time. There are many elements that can result in a simulation not using a core 100%. Either you fix those, or you simply add more simulations.

but shouldn't I get a bit more than that?

You should be able to get 100% on every core.

Now, if you read Khaledds half knowledge... here is the truth:

  • Hyper-Threading means both cores do not have everything, that is true, so both cores can for example not do certain operations at the same time.
  • Sadly for him, though, this is NOT VISIBLE TO THE OS-. CPU "load" factors" are based on "what % of the time was the core busy per OS scheduler". So, if a CPU Core had an active task 400ms of a second, it is 40% busy.

Hyper-Threading resource starvation (i.e. a virtual core has to wait for a resource) Will simple mean the virtual core takes longer to perform an operation - but this is not visible to the OS scheduler.If the core spends 100ms waiting internally, the task will take 500ms instead of 400ms. It is quite complex to try to find out when you run into resource starvation, and it is not something that the OS can do (i.e. it is something where you run special code and compare run-times to see that one takes longer than it should = Hyper-threading "bad". If the CPU would tun out fine grained internal usage statistics, you could pretty much say goodbye to any performance to start with - that is WAY too much data.

The result is that the second core simply will not add 100% performance - so if something takes 100ms on one core, with hyper threading and 2 cores it may take 75, not 50. Depends heavily on the code, though.

In your case I would start with a single thread and figure out whether you can get one core to 100%. If not, then the simulation simply is waiting for something - which is a stackoverflow issue if at all (program must be changed). If it is like it is (IO, writing / reading from disc) then it may be simple a necessity to run more than 1 simulation per core.