How SMP schedule work in Linux kernel? (ARM architecture)

armlinux-kernelscheduler

In linux, the scheduler will be triggered when a specific amount of time has passed. As I understood, the timer triggers an interrupt which in turn triggers a call to schedule.

In a SMP system, I read in the book "Understanding the Linux Kernel" that "each processor runs the scheduler() function on its own".
Does this mean every timer interrupt triggers every cpus to do a re-schedule at the same time?

Best Answer

The ARM SMP systems support two types of interrupts. SPI (shared peripheral interrupt) and PPI (peripheral private interrupts). The PPI is a per-CPU interrupt source. A special case for SMP of the PPI is an SGI (software generated interrupt); this is a CPU-to-CPU interrupt that is used to signal from one CPU to another in the SMP world (called IPI).^Note1

A PPI timer can be used to allow each CPU to use 'tickless scheduling'; that is timer interrupts are scheduled via knowledge of future time events (google timing wheel, look at the NO_HZ documentation, etc). The current Linux kernel doesn't use this specific PPI timer for scheduling. It is only used as a delay loop time source. Instead the Global PPI timer is used. This timer can interrupt each CPU selectively, but the register set is global to all CPUs. A particular CPU may schedule an interrupt for itself; with the time base being global.

The complication is that tasks must be migrated from one CPU to another in order to balance work among CPUs. Also, the Linux kernel's core code/scheduler is written for multiple CPUs (or architectures) and they may not have these per-CPU interrupt sources. An definitive answer may depend on your kernel version and the scheduler used (or more generally kernel configuration). Generally, a busy CPU will do the migration, other CPUs may wake on a timer tick just to see if a task in it's set should run (maybe a migrated process). If NO_HZ is in effect, some CPUs may not wake at all; they will get an IPI in the case of migration.

In any case, there is nothing that is ARM specific in the CPU scheduling besides the clock source. It is possible for an ARM SMP system to not have the a global PPI timer. In this case, every CPU may wake to service an interrupt, but the majority may sleep immediately. This could happen on any system due to a bad timer/interrupt controller design or a bad system configuration. However, even in these cases, the code would not call into the scheduler except where needed.

See: Linux Scheduler on SMP (which maybe a duplicate although the answer is not great IMO), IBMs completely fair scheduler article and O'Reillys Linux Kernel scheduler chapter.

Note1: This is actually GIC (or generic interrupt controller) terminology. However, most ARM SMP systems use this interrupt controller. It is bundled with Cortex-A CPUs and came as an external soft-component for some ARMv6 systems. It is possible for an ARM SMP systems to use another controller, but it is probably extremely rare or non-existent.

Edit: There are two ARM on-chip timers; these are useful as every Cortex-A has them compared to SOC vender timers. One of them is used instead of a 'counting loop' for a delay. This works better in the case of interrupts. I don't think it is critical to understand SMP scheduling, you may ignore that comment and just know that that source file is not used for scheduling. It was the first one I looked at. If you find it really distracting, I will remove that information.

See this paper on timing wheels; it is about 'IP'/networking, but the concept of NO_HZ is similar. Ie. Don't interrupt every 10mS, just to increment ticks. In the NO_HZ case, each CPU can set a future wake-up time based on what sort of requests drivers and sub-systems have given. Ie, schedule_work() needs to be run in 175ms, then the timer is set to that value for the CPU and we don't wake-up 17 times (if the system tick is 10mS), but just increment ticks by 17. Some CPUs may need a timeout to evict the current process to run another for multi-tasking as well, so the scheduler itself may set a timer.

Best Answer

Related Solutions

How does the linux kernel manage less than 1GB physical memory

How does schedule()+switch_to() functions from linux kernel actually work

Related Topic