This is rather a complex question since the answer(s) are dependent on many things:
- The CPU in question
- It can vary significantly even within the same family for example the additional registers added for SSE/MMX operations.
- The operating system, since it controls the handlers which trigger on a context switch and decide whether the CPU's hardware (if any) to assist in a context switch is used or not.
- For example Windows does not use the Intel hardware that can do much of the context switch storage for you since it does not store floating point registers.
- Any optimizations enabled by a program aware of it's own requirements and capable of informing the OS of this
- Perhaps to indicate that it isn't using FP registers so don't bother with them
- In architectures with sizeable register files like most RISC designs there is considerable benefit to knowing you need only a smaller subset of these registers
At a minimum the in use general purpose registers and program counter register will need to be saved (assuming the common design of most current CISC/RISC style general purpose CPUs).
Note that attempting to do only the minimal amount of effort in relation to a context switch is a topic of some academic interest
Linux obviously has more info available on this in the public domain though my references may be a little out of date.
There is a ‘task_struct’ which contains a large number of fields relating to the task state as well as the process that the task is for.
One of these is the ‘thread_struct’
/* CPU-specific state of this task */
- struct thread_struct thread;
holds information about cache TLS descriptors, debugging registers,
fault info, floating point, virtual 86 mode or IO permissions.
Each architecture defines it's own thread_struct which identifies the registers and other values saved on a switch.
This is further complicated by the presence of rename registers which allow multiple in flight instructions (either via superscalar or pipeline related architectural designs). The restore phase of a context swicth will likely rely on the CPU's pipeline being restored in a initially empty state such the the instructions which had not yet been retired in the pipeline have no effect and thus can be ignored. This makes the design of the CPU that much harder.
The difference between a process and a thread is that the process switch (which always means a thread switch in all main stream operating systems) will need to update memory translation information, IO related information and permission related structures.
These will mainly be pointers to the more rich data structures so will not be a significant cost in relation to the thread context switch.
You're pretty much correct, but threads share all segments except the stack. Threads have independent call stacks, however the memory in other thread stacks is still accessible and in theory you could hold a pointer to memory in some other thread's local stack frame (though you probably should find a better place to put that memory!).
Best Answer
It's much easier to explain those in reverse order because a process-switch always involves a thread-switch.
A typical thread context switch on a single-core CPU happens like this:
All context switches are initiated by an 'interrupt'. This could be an actual hardware interrupt that runs a driver, (eg. from a network card, keyboard, memory-management or timer hardware), or a software call, (system call), that performs a hardware-interrupt-like call sequence to enter the OS. In the case of a driver interrupt, the OS provides an entry point that the driver can call instead of performing the 'normal' direct interrupt-return & so allows a driver to exit via the OS scheduler if it needs the OS to set a thread ready, (eg. it has signaled a semaphore).
Non-trivial systems will have to initiate a hardware-protection-level change to enter a kernel-state so that the kernel code/data etc. can be accessed.
Core state for the interrupted thread has to be saved. On a simple embedded system, this might just be pushing all registers onto the thread stack and saving the stack pointer in its Thread Control Block (TCB).
Many systems switch to an OS-dedicated stack at this stage so that the bulk of OS-internal stack requirements are not inflicted on the stack of every thread.
It may be necessary to mark the thread stack position where the change to interrupt-state occurred to allow for nested interrupts.
The driver/system call runs and may change the set of ready threads by adding/removing TCB's from internal queues for the different thread priorities, eg. network card driver may have set an event or signaled a semaphore that another thread was waiting on, so that thread will be added to the ready set, or a running thread may have called sleep() and so elected to remove itself from the ready set.
The OS scheduler algorithm is run to decide which thread to run next, typically the highest-priority ready thread that is at the front of the queue for that priority. If the next-to-run thread belongs to a different process to the previously-run thread, some extra stuff is needed here, (see later).
The saved stack pointer from the TCB for that thread is retrieved and loaded into the hardware stack pointer.
The core state for the selected thread is restored. On my simple system, the registers would be popped from the stack of the selected thread. More complex systems will have to handle a return to user-level protection.
An interrupt-return is performed, so transferring execution to the selected thread.
In the case of a multicore CPU, things are more complex. The scheduler may decide that a thread that is currently running on another core may need to be stopped and replaced by a thread that has just become ready. It can do this by using its interprocessor driver to hardware-interrupt the core running the thread that has to be stopped. The complexities of this operation, on top of all the other stuff, is a good reason to avoid writing OS kernels :)
A typical process context switch happens like this:
Process context switches are initiated by a thread-context switch, so all of the above, 1-9, is going to need to happen.
At step 5 above, the scheduler decides to run a thread belonging to a different process from the one that owned the previously-running thread.
The memory-management hardware has to be loaded with the address-space for the new process, ie whatever selectors/segments/flags/whatever that allow the thread/s of the new process to access its memory.
The context of any FPU hardware needs to be saved/restored from the PCB.
There may be other process-dedicated hardware that needs to be saved/restored.
On any real system, the mechanisms are architecture-dependent and the above is a rough and incomplete guide to the implications of either context switch. There are other overheads generated by a process-switch that are not strictly part of the switch - there may be extra cache-flushes and page-faults after a process-switch since some of its memory may have been paged out in favour of pages belonging to the process owning the thread that was running before.