How to switch from user mode to kernel mode

linux-kernel

I'm learning about the Linux kernel but I don't understand how to switch from user mode to kernel mode in Linux. How does it work? Could you give me some advice or give me some link to refer or some book about this?

Best Answer

The only way an user space application can explicitly initiate a switch to kernel mode during normal operation is by making an system call such as open, read, write etc.

Whenever a user application calls these system call APIs with appropriate parameters, a software interrupt/exception(SWI) is triggered.

As a result of this SWI, the control of the code execution jumps from the user application to a predefined location in the Interrupt Vector Table [IVT] provided by the OS.

This IVT contains an adress for the SWI exception handler routine, which performs all the necessary steps required to switch the user application to kernel mode and start executing kernel instructions on behalf of user process.

Related Solutions

Linux – How do the likely/unlikely macros in the Linux kernel work and what is their benefit

They are hint to the compiler to emit instructions that will cause branch prediction to favour the "likely" side of a jump instruction. This can be a big win, if the prediction is correct it means that the jump instruction is basically free and will take zero cycles. On the other hand if the prediction is wrong, then it means the processor pipeline needs to be flushed and it can cost several cycles. So long as the prediction is correct most of the time, this will tend to be good for performance.

Like all such performance optimisations you should only do it after extensive profiling to ensure the code really is in a bottleneck, and probably given the micro nature, that it is being run in a tight loop. Generally the Linux developers are pretty experienced so I would imagine they would have done that. They don't really care too much about portability as they only target gcc, and they have a very close idea of the assembly they want it to generate.

How does schedule()+switch_to() functions from linux kernel actually work

After calling switch_to(), the kernel stack is switched to that of the task named in next. Changing the address space, etc, is handled in eg context_switch().
schedule() cannot be called in atomic context, including from an interrupt (see the check in schedule_debug()). If a reschedule is needed, the TIF_NEED_RESCHED task flag is set, which is checked in the interrupt return path.
See 2.
I believe that, with the default 8K stacks, Interrupts are handled with whatever kernel stack is currently executing. If 4K stacks are used, I believe there's a separate interrupt stack (automatically loaded thanks to some x86 magic), but I'm not completely certain on that point.

To be a bit more detailed, here's a practical example:

An interrupt occurs. The CPU switches to an interrupt trampoline routine, which pushes the interrupt number onto the stack, then jmps to common_interrupt
common_interrupt calls do_IRQ, which disables preemption then handles the IRQ
At some point, a decision is made to switch tasks. This may be from the timer interrupt, or from a wakeup call. In either case, set_task_need_resched is invoked, setting the TIF_NEED_RESCHED task flag.
Eventually, the CPU returns from do_IRQ in the original interrupt, and proceeds to the IRQ exit path. If this IRQ was invoked from within the kernel, it checks whether TIF_NEED_RESCHED is set, and if so calls preempt_schedule_irq, which briefly enables interrupts while performing a schedule().
If the IRQ was invoked from userspace, we first check whether there's anything that needs doing prior to returning. If so, we go to retint_careful, which checks both for a pending reschedule (and directly invokes schedule() if needed) as well as checking for pending signals, then goes back for another round at retint_check until there's no more important flags set.
Finally, we restore GS and return from the interrupt handler.

As for switch_to(); what switch_to() (on x86-32) does is:

Save the current values of EIP (instruction pointer) and ESP (stack pointer) for when we return to this task at some point later.
Switch the value of current_task. At this point, current now points to the new task.
Switch to the new stack, then push the EIP saved by the task we're switching to onto the stack. Later, a return will be performed, using this EIP as the return address; this is how it jumps back to the old code that previously called switch_to()
Call __switch_to(). At this point, current points to the new task, and we're on the new task's stack, but various other CPU state hasn't been updated. __switch_to() handles switching the state of things like the FPU, segment descriptors, debug registers, etc.
Upon return from __switch_to(), the return address that switch_to() manually pushed onto the stack is returned to, placing execution back where it was prior to the switch_to() in the new task. Execution has now fully resumed on the switched-to task.

x86-64 is very similar, but has to do slightly more saving/restoration of state due to the different ABI.

Best Answer

Related Solutions

Linux – How do the likely/unlikely macros in the Linux kernel work and what is their benefit

How does schedule()+switch_to() functions from linux kernel actually work

Related Topic