Electronic – How to the number of clock cycles required to complete an instruction in a pipelined processor less than pipeline latency

assemblycomputer-architecturemicroprocessorverilogvhdl

I am not new to computer architecture but I have only academic experience with micro-architecture implementation.

I have heard and read this many times but never really bothered to understand the statement: Some instructions complete in 1 or 2 clock cycles while more complex instructions say integer or floating point complete in 2, 4, 6 clock cycles etc or load/store in 80-100 clock cycles because of slow memory.

Now I am sure most processors be it embedded or desktop have few stages of pipelines say from 5 stages upto 30 stages. So the latency for each instruction should be equal to pipeline depth or number of pipeline stages. Also, throughput of a single pipeline scalar processor can be maximum 1 IPC (Instructions per cycle). But how can some instructions finish in 1,2 or 4 clock cycles for a processor with 10 stage or 12 stage pipeline ? Can someone explain me that ?

PS: Only thing I can understand is that maybe some stages are marked as a Multi-Cycle stage as is usually done during STA and timing closure. And that they are trying to say that execution of instruction takes 1cc, 2cc, 4cc etc. in that particular Multi-cycle stage ?

Best Answer

Generally, instruction execution time is measured not from when it enters the pipeline to when it leaves, but rather from the time it passes some arbitrary point in the pipeline to the time the next instruction passes that point. If no instruction takes more than e.g. 20 cycles to make its way through the pipeline, measuring the time for a sequence of instructions to pass through some arbitrary state will yield a result that's within 20 cycles of the actual time required to execute the whole sequence from start to finish. Since programmers are generally far less interested in the time to execute a single instruction, than in the time required to execute sequences containing many instructions (often thousands, if not more), they generally will only care about pipelining in cases where it can add a non-constant cost to the overall execution time (e.g. if repeated execution of an instruction sequence will add a pipeline stall each time).