Electronic – 5 cycle instruction forwarding – MIPS

mips

Consider the following MIPS instructions:

lw r6, 0(r1)
lw r5, 0(r2)
add r5, r5, r6

Assume I have full forwarding capabilities. I know that when I produce a value, I only "forward" it to another next instruction right before I consume it. With that being said, am I allowed to forward from write back stage to execute stage? I've only seen write back stage do direct forwarding to decode.
Consider the following cycle diagram:

     C1 C2 C3 C4 C5 C6 C7
lw   F  D  X  M  W     
lw      F  D  X  M  W        
add        F  D  D  X  M  WB

Note: The second occurence of the decode stage signifies a "stall."

Now, in that diagram, I only get r5 after the 2nd lw instruction finishes its MEM stage. So, I have to stall decode. But at that point, the first lw instruction has completed WB stage. So in that case, would I forward from write back to decode or forward to execute?

If I forward from writeback to decode, that seems like convention. However, if I forward from write back to execute, that seems to comply the practice of forwarding right before you consume.

Best Answer

If I understand your question correctly:

You shouldn't have to forward from writeback to decode, WB happens in the first half of the cycle and ID in the second.

enter image description here

This is the best image I could find on short notice, give me a minute and I will try and find the relevant image from a textbook. Alright, here are some relevant figures from Patterson & Hennessy (a pretty darn good text on computer organization, focusing on the MIPS architecture. I would recommend you get a copy):

enter image description here

enter image description here

In writeback, the register file is written to in the first half of the cycle; in instruction decode, the register file is read from in the second half of the cycle. As I understand it, this text's description of MIPS is fairly accurate. In hardware, this might be implemented with two clocks: the EX/MEM/WB stages would be driven by clock A, and IF/ID would be driven by clock B, 180ยบ degrees out of phase (inverted). Careful design of the logic ensures that IF/ID/WB are "complete" in less than half a clock cycle.

In a simulator, you simply have to update the register file from writeback before decode is executed.

Figures from Patterson & Hennessy, Computer Organization and Design, Chapter 4.