What other problems can be caused when using the JMP instruction to navigate through subroutines instead of using the instruction CALL/RETURN

assemblymicrochip

Image of the code

I've been creating a report on why the JUMP instruction shouldn't be used in the piece of code above and instead CALL/RETURN instructions should be used to navigate through the subroutines.

I understand that the code will never reach back2: because it is using the jump command and would continuously loop between back1: and the calculateNextPower routine.

Would I be correct saying that:

the jump instruction doesn't push the instruction onto a stack and results in you not being able to return to where you've jumped from.
When using call, it prevents you from pushing to many addresses to the stack by recursion like the JUMP instruction would. This can cause a stack overflow/tail recursion and can result in you not getting the result of the calculation until you have returned by using the call and return?

I think that I've made this way more complicated than what it is, read way to many sites and have confused myself on the reasons why it should be used instead of the jump instruction, other than it not being able to get to the "back2:" routine.

Appreciate any help on clearing this up, I'm new to this type of work

Best Answer

So many misconceptions, where to begin...?

First, the real problem with your code is a badly architected loop. Not that CalculateNextPower is immediately before the loop and at the end of the loop. This code should simply be put in-line at the start of the loop, right below Back1. Then there wouldn't be any need to jump or call to code outside the loop.

Second, nothing pushes a instruction onto the stack. CALL instructions generally push addresses onto a stack. That is the address of the instruction following the call. The RETURN instruction pops this address off the stack and jumps to it, thereby returning to immediately after the CALL instruction.

In general, the point of the CALL/RETURN mechanism is to allow execution of a piece of code in multiple routines, but have the piece of code only exist in memory once. CALL/RETURN allows for temporarily diverting execution to the single piece of re-usable code, then going back to where you were.

Related Solutions

Electronic – MIPS (PIC32): branch vs. branch likely

MIPS is one of several RISC (reduced instruction set computers) architectures that are designed to execute one instruction per clock cycle. In order to achieve this, the original MIPS processors had a five-stage pipeline:

enter image description here

The abbreviations are in the above figure are: IF (Instruction Fetch), RD (Read from register file), ALU (Execute instruction in Arithmetic Logic Unit), MEM (Read/write Memory access), WB (Write back to register file). The vertical axis is successive instructions; the horizontal axis is time.

Because the MEM stage occurs after the ALU stage, RISC machines like MIPS don't do arithmetic or logical operations on memory, but only on registers. For this reason they are also referred to as load/store architectures.

There are several hazard conditions where the pipeline can stall and cause a penalty in the over instructions per cycle (IPC) value. A data hazard occurs, for example, when an instruction attempts to use data in one of the registers before it has been loaded into the register. For example:

lw $3, 100($2)
add $1, $2, $3

The data is not loaded until the MEM stage of the first instruction, which is too late for it to be available for the EX stage of the second instruction.

Control hazards occur because on any branch taken, the instruction immediately after the branch is always fetched from the instruction cache. If this instruction is ignored, there is a one cycle per taken branch IPC penalty.

The solution for the MIPS architecture was the "Branch Delay Slot": always fetch the instruction after the branch, and always execute it, even if the branch is taken. This gets a little weird when writing MIPS assembly code, because when you are reading it, you have to take into account the instruction after the branch is always going to be executed. The trick in writing efficient code is to put in an instruction that will be useful as part of the loop that is being taken executed, but do no harm if the branch is not taken.

The MIPS designers were counting on compiler writers to write clever enough code generators to handle this efficiently. However many do not (including Microchips C32 compiler, based on GCC), and just put NOP's after every branch, wasting both code space and cycles.

So in the R4000 architecture, MIPS added Branch Likely instructions which still always fetch the instruction after the branch from the instruction cache, but only execute it if the branch is taken (opposite of what one might expect). Compilers can then always fill the branch delay slot on such a branch.

A loop like:

loop:
    first instruction
    second instruction
    ...
    blez t0, loop
    nop

can be turned into:

loop:
    first instruction
loop2:   
    second instruction
    ...
    blez t0, loop2
    first instruction

The repeated "first instruction" after the branch is always executed if the branch is taken (and becomes part of the next go-around of the loop. This instruction is ignored if the branch is not taken (incurring a slight IPC penalty).

However as it turns out, trying to include this feature in high-performance designs has been a pain in the neck due to the complexity in getting rid of the result of the ignored instruction. Therefore it has been deprecated.

Electronic – Could an ARM (ARM7TDMI) Branch instruction take 6 cycles

Are you running code from RAM or from flash? ARM processors that run code from flash often require wait states in at least some circumstances; such processors often include hardware which can eliminate most of the wait states in common code, but such hardware may be as simple as a single-line buffer which allows an access to the same line of flash as the previous access to avoid the wait state. If the branch target is the last word of a flash line, then the flash would require two or three cycles to fetch that word, and two or three cycles to fetch the following word. If one of the cycles is performed concurrently with some other CPU operation, that would leave a three-cycle penalty.

Best Answer

Related Solutions

Electronic – MIPS (PIC32): branch vs. branch likely

Electronic – Could an ARM (ARM7TDMI) Branch instruction take 6 cycles

Related Topic