I'm trying to understand how the schedule process in linux kernel actually works. My question is not about the scheduling algorithm. Its about how the functions schedule()
and switch_to()
work.
I'll try to explain. I saw that:
When a process runs out of time-slice, the flag need_resched
is set by scheduler_tick()
. The kernel checks the flag, sees that it is set, and calls schedule()
(pertinent to question 1) to switch to a new process. This flag is a message that schedule should be invoked as soon as possible because another process deserves to run.
Upon returning to user-space or returning from an interrupt, the need_resched
flag is checked. If it is set, the kernel invokes the scheduler before continuing.
Looking into the kernel source (linux-2.6.10 – version that the book "Linux Kernel Development, second edition" is based on), I also saw that some codes can call the schedule()
function voluntarily, giving another process the right to run.
I saw that the function switch_to()
is the one that actually does the context switch. I looked into some architecture dependent codes, trying to understand what switch_to()
was actually doing.
That behavior raised some questions that I could not find the answers for :
-
When
switch_to()
finishes, what is the current running process? The process that calledschedule()
? Or the next process, the one that was picked to run? -
When
schedule()
gets called by an interrupt, the selected process to run starts to run when the interrupt handling finishes (after some kind of RTE) ? Or before that? -
If the
schedule()
function can not be called from an interrupt, when is the flag-need_resched
set? -
When the timer interrupt handler is working, what stack is being used?
I don't know if I could make myself clear. If I couldn't, I hope I can do this after some answers (or questions).
I already looked at several sources trying to understand that process. I have the book "Linux Kernel Development, sec ed", and I'm using it too.
I know a bit about MIPs and H8300 architecture, if that help to explain.
Best Answer
switch_to()
, the kernel stack is switched to that of the task named innext
. Changing the address space, etc, is handled in egcontext_switch()
.schedule()
cannot be called in atomic context, including from an interrupt (see the check inschedule_debug()
). If a reschedule is needed, the TIF_NEED_RESCHED task flag is set, which is checked in the interrupt return path.To be a bit more detailed, here's a practical example:
schedule()
.schedule()
if needed) as well as checking for pending signals, then goes back for another round atretint_check
until there's no more important flags set.As for
switch_to()
; whatswitch_to()
(on x86-32) does is:current_task
. At this point,current
now points to the new task.switch_to()
current
points to the new task, and we're on the new task's stack, but various other CPU state hasn't been updated.__switch_to()
handles switching the state of things like the FPU, segment descriptors, debug registers, etc.__switch_to()
, the return address thatswitch_to()
manually pushed onto the stack is returned to, placing execution back where it was prior to theswitch_to()
in the new task. Execution has now fully resumed on the switched-to task.x86-64 is very similar, but has to do slightly more saving/restoration of state due to the different ABI.