Electronic – What happens if a branch prediction overwrites a value

assemblycpumips

We are just learning about branch prediction so I might not totally understand how they work, but as I understand it, the branches are set to predict either taken or not taken. The pipeline will start processing the next instruction or two before the branch is decided. Once the branch is decided, if the prediction was correct nothing happens but if the prediction was wrong, the pipeline simply switches to processing the correct instruction at the next clock.

This leads me to my question.

Say I had some assembly like this:

loop: addi r1 r1 1
sub r2 $0 r3
...
...(other random things)
...
bne r1 $0 loop
sw r2 0(r1)

If the branch were predicted as taken, the values of r1 and r2 would be overwritten before the branch was resolved and then sw would have the wrong values.
How does one deal with this? Is this just an inherent problem in branch prediction or is there a protocol for storing those values until the branch has been decided?

Thank you!!

Best Answer

Branch prediction is a microarchitecture technique for improving the performance of pipelined processors.

I know 3 ways a person can design a processor with branch prediction to handle mis-prediction: instruction cancellation, register renaming, or delay slots. (Are there any other ways?)

delay slots

The MIPS and the SHARC DSP and several other processors use delay slots. Delay slots can be seen as a very simple kind of branch prediction -- one that always predicts that a branch will never be taken. If there is a branch misprediction -- if the branch is actually taken -- then instructions in the delay slot are unconditionally executed anyway. This is an inherent problem with delay slots.

However, it is not an inherent problem with branch prediction.

instruction set compatibility

Most CPUs are designed to be binary-compatible with some previous CPU. Since many early CPUs completely finished executing one instruction before going on to the next, many modern CPUs try to maintain the illusion. Most processors act "as if" it executes one instruction completely before going on to the next, effectively acting "as if" it did not actually execute any of those wrong-way instruction on the wrong side of the branch. (Some people find delay slots "illogical" because delay slots break the illusion of executing one instruction completely before going on to the next).

In a pipelined processor, the results are not actually written (committed) to the register file or to memory until the last stage of the pipeline -- the write-back stage.

instruction cancellation

With instruction cancellation, when there is a branch mis-prediction, a feedback wire from the branch resolution stage to all the previous stages indicates that there was a mis-prediction. On the next clock tick, when every instruction in the pipeline moves forward to the next stage, all the instructions in the pipeline "behind" the branch are simultaneously transformed into a "bubble", the equivalent of a NOP instruction. Meanwhile, the instruction fetcher begins loading the pipeline with instructions from the "right-way branch". One at a time, each instructions reaches the write-back stage. Instructions that have been cancelled (transformed into a bubble) do not write anything to the register file or memory. That prevents the wrong values from ever showing up in the register file or memory. After the bubbles have all drained out, instructions from the right-way branch start hitting the write-back stage and writing the correct values to the register file or memory.

register renaming

I hear rumors that some superscalar processors with out-of-order execution use register renaming to enable eager execution -- to fetch instructions from both sides of the branch. As the instructions are loaded into the pipeline, the instruction decoder expands/renames short logical register names in the instruction into to longer physical names. The same register name in two different instructions on two sides of the branch is expanded to two different physical register names.

When the branch instruction gets to the resolution stage that finally resolves which direction it will really go, all the memory STORE instructions in the wrong branch are cancelled. Meanwhile, the instruction fetcher stops pulling instructions from the wrong branch and focuses on only the right branch.

As each instruction reaches the write-back stage, those that were from the wrong branch may (in some designs) write to a physical register, but the wrong-way data in those physical registers never actually influences any STORE instructions that are actually executed -- any STORE instruction with any direct or indirect dependency on that wrong-way data will be turned into a bubble that does nothing by the time it hits the write-back stage. All STOREs in the right-way branch will execute correctly, because they only use physical registers influenced by instructions from the right-way branch. Those physical registers are never overwritten by any instruction in the wrong-way branch.