Every RTOS for a PIC which does not have a software-addressable stack generally requires that all but one of the tasks must have its work divided into uninterruptable pieces which begin and end at the top stack level; the "task-yield" operation does not use a function call, but rather a sequence like
// This code is part of task C (assume for this example, there are tasks
// called A, B, C
movlw JumpC4 & 255
goto TASK_SWITCH_FROM_C
TargetC4:
Elsewhere in the code would be some code like:
TASK_SWITCH_FROM_A:
movwf nextJumpA // Save state of task C
// Now dispatch next instruction for task A
movlw TaskB_Table >> 8
movwf PCLATH
movf nextJumpB,w
movwf PCL
TASK_SWITCH_FROM_B:
movwf nextJumpB // Save state of task C
// Now dispatch next instruction for task A
movlw TaskC_Table >> 8
movwf PCLATH
movf nextJumpC,w
movwf PCL
TASK_SWITCH_FROM_C:
movwf nextJumpC // Save state of task C
// Now dispatch next instruction for task A
movlw TaskA_Table >> 8
movwf PCLATH
movf nextJumpA,w
movwf PCL
At the end of the code, for each task, there would be a jump table; each table would have to fit within a 256-word page (and could thus have a maximum of 256 jumps)
TaskC_Table:
JumpC0 : goto TargetC0
JumpC1 : goto TargetC1
JumpC2 : goto TargetC2
JumpC3 : goto TargetC3
JumpC4 : goto TargetC4
...etc.
Effectively, the movlw
at the start of the task-switch sequence loads the W register with the LSB of the address of the instruction at JumpC4
. The code at TASK_SWITCH_FROM_C
would stash that value someplace, and then dispatch the code for task A. Later, after TASK_SWITCH_FROM_B
is executed, the stored JumpC4
address would be reloaded into W and the system would jump to the instruction pointed to thereby. That instruction would be a goto TargetC4
, which would in turn resume execution at the instruction following the task-switch sequence. Note that task switching doesn't use the stack at all.
If one wanted to do a task switch within a called function, it might be possible to do so if that function's call and return were handled in a manner similar to the above (one would probably have to wrap the function call in a special macro to force the proper code to be generated). Note that the compiler itself wouldn't be capable of generating code like the above. Instead, macros in the source code would generate directives in the assembly-language file. A program supplied by the RTOS vendor would read the assembly-language file, look for those directives, and generate the appropriate vectoring code.
Best Answer
There are many low-level microcontrollers that have hardware stacks for subroutine call/return and interrupt handling, but make it difficult if not impossible to store data (variables) there, and implementing a purely software data stack would be terribly inefficient. The 8051 is one classic example, and low-end PICs (PIC12/PIC16) are another. On these machines, the data stack is emulated by assigning static storage locations for automatic variables, with the amount of reuse of these locations being dependent on the sophistication of the compiler.
Note that if stack emulation is being done this way, it means that recursion — a function that calls itself, either directly or indirectly — does not work, since each instance of the function reuses the same static locations for its supposedly "private" variables. Some compilers do allow the limited use of recursion (typically implemented by means of a
#pragma
of some sort), which will cause it to create a true data stack no matter how much it slows things down.Just as an aside, there have been CPU architectures that did not have a hardware stack at all, not even for subroutine/interrupt handling, including the DEC PDP-8 and the IBM System/360. On these machines, the PC (return address) and status register (for interrupts) were saved in registers or memory locations, but in every instance I can think of, the machine also had sufficiently flexible address modes that made it easy to create a stack with software.