Electronic – Task switching on Cortext-M3 crashes after IRQ

armcortex-m3operating systemos

I've used an exokernel model for my ARM OS. When a task wants to read from a UART it calls a library function, which, if there's no data, makes a SVC call to block the task (which causes the kernel to put the task in the wait queue for that IRQ and enables the IRQ). When the interrupt happens, all the tasks waiting on it are moved to the runnable queue and the interrupt is disabled again.

This model was working fine when I had a fixed array of tasks, but now I've moved to linked lists to allow for more types of wait queue (e.g. IPC messages). Something in the change is causing a crash. Here's the debug output:

Creating task 0 (idle task)
task0 stack top is 2007cd20
Starting SysTick @ 100Hz
Becoming task 0
Switching to task gsm@2007c008 with SP 2007c3e8
GSM task starting
Switching to task rfid@2007c430 with SP 2007c810
Monitoring RFID reader
Blocking task rfid on IRQ 7
Switching to task gps@2007c858 with SP 2007cc38
Switching to task task0@2007cc80 with SP 2007ccd8
Switching to task gsm@2007c008 with SP 2007c390
Blocking task gsm on IRQ 8
Switching to task gps@2007c858 with SP 2007cc38
Switching to task task0@2007cc80 with SP 2007ccd8
Switching to task gps@2007c858 with SP 2007cc38
Starting GPS tracking
Blocking task gps on IRQ 6
Switching to task task0@2007cc80 with SP 2007ccd8
[... repeats...]
Switching to task task0@2007cc80 with SP 2007ccd8
Unblocking tasks waiting on IRQ 8
Switching to task gsm@2007c008 with SP 2007c3a0
Switching to task task0@2007cc80 with SP 2007ccd8
Switching to task gsm@2007c008 with SP 2007c3a0
Fault: �� �
   r0 = 2007c3a0
   r1 = 10007fb8
   r2 = 2007ccd8
   r3 = 10007fb8
  r12 = 00000008
   lr = fffffffd
   pc = 0070c858
  psr = 00000003
 BFAR = e000ed38
 CFSR = 00040000
 DFSR = 00000000
 AFSR = 00000000
SHCSR = 00070008

So everything is fine until the interrupt. The actual output varies depending on which UART has data first, but the pattern is the same: when an interrupt happens, a fault occurs when the unblocked task is switched to the second time.

Here's the relevant bits of code. An assembly shim:

zeptos_pendsv_isr:
    push {lr}
    mrs r0, psp
    stmfd r0!, {r4-r11}
    bl zeptos_schedule
    ldmfd r0!, {r4-r11}
    msr psp, r0
    pop {pc}

And the C functions:

static void pendsv(void) {
    SCB->ICSR |= 1 << 28;
}

void *zeptos_schedule(void *sp) {
    if (current_task) {
        current_task->sp = sp;
        DL_APPEND(runnable_tasks, current_task);
    }
    current_task = runnable_tasks;
    DL_DELETE(runnable_tasks, current_task);
    zeptos_printf("Switching to task %s@%p with SP %p\n", current_task->name, current_task, current_task->sp);
    return current_task->sp;
}

static void block(void *sp, uint8_t irq) {
    zeptos_printf("Blocking task %s on IRQ %i\n", current_task->name, irq);
    current_task->sp = sp;
    DL_APPEND(irq_blocked_tasks[irq], current_task);
    current_task = 0;
    NVIC_EnableIRQ(irq);
    pendsv();
}

void __attribute__((interrupt)) zeptos_isr(void) {
    int irq = (SCB->ICSR & 0xff) - 16;
    zeptos_printf("Unblocking tasks waiting on IRQ %i\n", irq);
    NVIC_DisableIRQ(irq);
    // NVIC_ClearPendingIRQ(irq);
    DL_CONCAT(runnable_tasks, irq_blocked_tasks[irq]);
    irq_blocked_tasks[irq] = 0;
    pendsv();
}

void __attribute__((interrupt)) zeptos_svc_isr(void) {
    __disable_irq();
    uint32_t *sp = (uint32_t *) __get_PSP();
    uint32_t pc = sp[6];
    uint8_t svc_type = *((uint8_t *) pc - 2);
    switch (svc_type) {
        case 0:
            sleep(sp[0]);
            break;

        case 1:
            block(sp, sp[0]);
            break;

        default:
            zeptos_puts("Bad SVC type\n");
    }
    __enable_irq();
}

void Zeptos_BlockOnIrq(uint8_t irq) {
    asm("svc 1");
}

SVC, SysTick and PendSV are priority 29, 30 and 31 respectively.

Any suggestions? Where should I be looking?

Best Answer

I found the problem, finally. When my SVC handler calls block to put a task on the blocked list, that task's stack only has the registers stacked by the hardware, and not the {r4-r11} that the scheduler is expecting there to be when it runs it again later.

The quick fix is to have an assembly shim for the SVC ISR that stacks and unstacks the extra registers, and to have the C zeptos_svc_isr function return a stack pointer as zeptos_schedule does. It works, but some refactoring is in order now.