Consider the following MIPS instructions:
lw r6, 0(r1)
lw r5, 0(r2)
add r5, r5, r6
Assume I have full forwarding capabilities. I know that when I produce a value, I only "forward" it to another next instruction right before I consume it. With that being said, am I allowed to forward from write back stage to execute stage? I've only seen write back stage do direct forwarding to decode.
Consider the following cycle diagram:
C1 C2 C3 C4 C5 C6 C7
lw F D X M W
lw F D X M W
add F D D X M WB
Note: The second occurence of the decode stage signifies a "stall."
Now, in that diagram, I only get r5
after the 2nd lw instruction finishes its MEM stage. So, I have to stall decode. But at that point, the first lw instruction has completed WB stage. So in that case, would I forward from write back to decode or forward to execute?
If I forward from writeback to decode, that seems like convention. However, if I forward from write back to execute, that seems to comply the practice of forwarding right before you consume.
Best Answer
If I understand your question correctly:
You shouldn't have to forward from writeback to decode, WB happens in the first half of the cycle and ID in the second.
This is the best image I could find on short notice, give me a minute and I will try and find the relevant image from a textbook.Alright, here are some relevant figures from Patterson & Hennessy (a pretty darn good text on computer organization, focusing on the MIPS architecture. I would recommend you get a copy):In writeback, the register file is written to in the first half of the cycle; in instruction decode, the register file is read from in the second half of the cycle. As I understand it, this text's description of MIPS is fairly accurate. In hardware, this might be implemented with two clocks: the EX/MEM/WB stages would be driven by clock A, and IF/ID would be driven by clock B, 180ยบ degrees out of phase (inverted). Careful design of the logic ensures that IF/ID/WB are "complete" in less than half a clock cycle.
In a simulator, you simply have to update the register file from writeback before decode is executed.
Figures from Patterson & Hennessy, Computer Organization and Design, Chapter 4.