Clock : CPU clock vs Pipeline Clock Vs Memory Access Clock

clock-speedcpu

A few days ago, I implemented a multi-cycle unpipelined soft-core on a FPGA. It worked like a charm. The FSM beautifully orchestrated FETCH->DECODE->OPER1->OPER2->OPER3->ALU->MEMORY->REG WB->PC UPDATE->FETCH->DECODE->……

I now wish to transform it into a pipelined cpu.

While am aware of the principles such as splitting up the data path into smaller stages and littering it with pipeline latches, the timing topics have caught me off-guard, especially those concerning load/store instructions.

The MEMORY stage is clearly the slowest stage. If it takes 100 ns to complete either Read/Write, I will simply ensure every other stage takes 100 ns to complete (although they could complete much faster) to balance the pipeline.

And now the question.

Will I be right in simply ticking each stage at 10 MHz including the Memory Stage AND ticking the memory interface (Memory controller and SRAM) at 50 Mhz? The 50Mhz ensures that the MEMORY stage is able to finish its activities before the next 10 Mhz stage tick arrives.

The memory interface (controller) resides inside the CPU. So the CPU must receive 50 Mhz as input further scaled down internally to 10Mhz for clocking the stages. And of course, the 50Mhz goes to the memory controller.

In this case, what should I document as the CPU clock? 10 Mhz or 50 Mhz? I do understand that the throughput and overall pipeline latency have 10 Mhz as the base, but the input clock to the CPU as seen by the user is 50 Mhz!!

Best Answer

No, it would be normal to clock the CPU at 50MHz. Or double the input clock to 100MHz for example, if the design is fast enough.

Now you know the memory stage takes 5 clock cycles, so you stall the CPU when it can do nothing because it is waiting for memory. Then there is a obvious gain from compiling your code to minimise memory accesses and their associated speed penalty, for example by using registers wherever possible.

At a later stage you can add cache memory, which "snoops" all memory transactions, and satisfies as many as it can from a small fast memory, perhaps in 2 cycles instead of 5. Where the cache memory doesn't contain the data, you have a "cache miss" and the main memory proceeds as normal. Logic in the cache controller decides whether or not to also store data from main memory, so the cache has it next time...