Multithreading – How a Single Thread Runs on Multiple Cores

cpuhardwaremulti-coremultithreading

I am trying to understand, at a high-level, how single threads run across multiple cores. Below is my best understanding. I do not believe it is correct though.

Based on my reading of Hyper-threading, it seems the OS organizes
the instructions of all threads in such a way that they are not
waiting on each other. Then the front-end of the CPU further
organizes those instructions by distributing one thread to each core,
and distributes independent instructions from each thread among any
open cycles.

So if there is only a single thread, then the OS will not do any optimization. However, the front-end of the CPU will distribute independent instruction sets among each core.

According to https://stackoverflow.com/a/15936270, a specific programming language may create more or less threads, but it is irrelevant when determining what to do with those threads. The OS and CPU handle this, so this happens regardless of the programming language used.

enter image description here

Just to clarify, I am asking about a single thread run across multiple cores, not about running multiple threads on a single core.

What is wrong with my summary? Where and how is a thread's instructions split up among multiple cores? Does the programming language matter? I know this is a broad subject; I am hoping for a high-level understanding of it.

Best Answer

The operating system offers time slices of CPU to threads that are eligible to run.

If there is only one core, then the operating system schedules the most eligible thread to run on that core for a time slice. After a time slice is completed, or when the running thread blocks on IO, or when the processor is interrupted by external events, the operating system reevaluates what thread to run next (and it could choose the same thread again or a different one).

Eligibility to run consists of variations on fairness and priority and readiness, and by this method various threads get time slices, some more than others.

If there are multiple cores, N, then the operating system schedules the most eligible N threads to run on the cores.

Processor Affinity is an efficiency consideration. Each time a CPU runs a different thread than before, it tends to slow down a bit because its cache is warm for the previous thread, but cold to the new one. Thus, running the same thread on the same processor over numerous time slices is an efficiency advantage.

However, the operating system is free to offer one thread time-slices on different CPUs, and it could rotate through all the CPUs on different time slices. It cannot, however, as @gnasher729 says, run one thread on multiple CPUs simultaneously.

Hyperthreading is a method in hardware by which a single enhanced CPU core can support execution of two or more different threads simultaneously. (Such a CPU can offer additional threads at lower cost in silicon real-estate than additional full cores.) This enhanced CPU core needs to support additional state for the other threads, such as CPU register values, and also has coordination state & behavior that enables sharing of functional units within that CPU without conflating the threads.

Hyperthreading, while technically challenging from a hardware perspective, from the programmer's perspective, the execution model is merely that of additional CPU cores rather than anything more complex. So, the operating system sees additional CPU cores, though there are some new processor affinity issues as several hyperthreaded threads are sharing one CPU core's cache architecture.


We might naively think that two threads running on a hyperthreadded core each run half as fast as they would each with their own full core. But this is not necessarily the case, since a single thread's execution is full of slack cycles, and some amount of them can be used by the other hyperthreaded thread. Further, even during non-slack cycles, one thread may be using different functional units than the other so simultaneous execution can occur. The enhanced CPU for hyperthreading may have a few more of certain heavily used functional units specially to support that.