Electronic – The number of cycles needed to execute the following loop in pipeline processor

computer-architecturecpu

This question was asked in an objective paper; GATE CSE

Consider a 4 stage pipeline processor. The number of cycles needed by the
four instructions I1, I2, I3, I4 in stages S1, S2, S3, S4 is shown below:

enter image description here

What is the number of cycles needed to execute the following loop?

For (i=1 to 2){I1; I2; I3; I4;}

A) 16

B) 23

C) 28

D) 30

============================================================

The answer given is- 23

enter image description here

enter image description here

My doubt is- why in 6th cycle we are introducing I4 {S1} while I3 {S2} hasn't gone to next stage?

In this question if we take the above approach, we will get different answer, according to me, I4 {S1} will be introduced in 7th clock.

because- In pipeline execution, each stage produces output to a buffer and next stage takes it from there. So, unless the previous instruction(here I3) goes to the next stage, the current instruction(I4) cannot execute a given stage

P.S.- I have also posted this at cs.stackexchange, but no one have answered it. Please help!

Best Answer

Certainly it would be common for a pipeline stage to stall until the next stage is available, but there is no reason a CPU couldn't arrange to have buffers between stages that were independent of either stage. The logic would be harder, but as it could gain a cycle advantage in the situation shown it may be worth doing so.

As to whether the given answer is correct, IMO either should be accepted, unless the syllabus specifies one behavior or the other as correct. I do not know anything about the syllabus mentioned, so can't comment on that.