In grossly simplified terms, a processor calculates 1 instruction in a single clock cycle. But what does that even mean? If a processor is a bunch of transistors, is 1 clock cycle simply 1 state change (or the potential for a state change) for all those millions of transistors? Is 1 clock cycle how often the state of every transistor get evaluated? What exactly is happening, electrically, in 1 clock cycle?
Electronic – Inside a CPU, what happens in a single clock cycle
processor
Related Solutions
So after a lot of digging I found the answer to the problem. I am not sure if it helps anyone but I am going to post the answer anyways.
Here are the signals of the code above failing.
The problem is actually very simple and I only spotted it when someone told me this: "At the positive edge of the clock something should change, otherwise your design is stuck!". If you look at the signals above, at 15ns something is wrong because none of the registers change when the processor should have executed the first instruction! The reason is because the signel instr[7:0] didnt get assigned the value of the next instruction (which was correctly fetched). The problem is because of this section of code:
fetch.v:
always @ (posedge clk) begin
if (exec_stat == 1'b1) begin
// If we are fetching, store the instruction in the 'data' buffer and set exec_stat low
exec_stat <= 1'b0;
empty_instr[3:0] <= 4'b0000;
instr0 <= data[7:0];
instr1 <= data[15:8];
instr2 <= data[23:16];
instr3 <= data[31:24];
end else if (iptr[1:0] == 2'b00) begin
// the last two bits of the iptr address define which 8-bits from the buffer we need to use as instruction
if (empty_instr[0]) begin
// if there is no instruction at this address then
// go to fetch state
exec_stat <= 1'b1;
end else begin
instr <= instr0;
empty_instr[0] <= 1'b1;
end
end else if (iptr[1:0] == 2'b01) begin
if (empty_instr[1]) begin
exec_stat <= 1'b1;
end else begin
instr <= instr1;
empty_instr[1] <= 1'b1;
end
end else if (iptr[1:0] == 2'b10) begin
if (empty_instr[2]) begin
exec_stat <= 1'b1;
end else begin
instr <= instr2;
empty_instr[2] <= 1'b1;
end
end else if (iptr[1:0] == 2'b11) begin
if (empty_instr[3]) begin
exec_stat <= 1'b1;
end else begin
instr <= instr3;
empty_instr[3] <= 1'b1;
end
end
end
This is wrong because it means that 'instr' is driven when the clock goes high, which means that there are some cycles that the processor does nothing and the PC is not updated in time...
The solution is to NOT make 'instr' a registered output rather constantly assign it using a multiplexer. Something like:
mux8_4 mux0_4(iptr[1:0], instr0, instr1, instr2, instr3, instr);
After all, if it contains the wrong instruction it doesnt matter because that means that we would go into a fetch state and the processor would ignore the values. So there is no reason to put this assignment in the always block. With this change the signals look like this:
Clearly, everything now changes at the appropriate time!!!! :) :D
Best Answer
The transistors in a CPU will be arranged to form small functional units. These fall into two main categories: combinatorial logic (logic gates, adders, multiplexers, etc.) and stateful logic (flip flops, latches, SRAM, etc.). Combinatorial logic performs various logical and mathematical operations, while stateful logic can store data. Generally the clock is only connected to stateful logic, such as flip-flops. Combinatorial logic is then connected between stateful logic elements. So you might have a flip flop that feeds some logic gates, which then feeds another flip flop. Combinatorial logic has a propagation delay associated with it--how long it takes for the output to change after the input changes. Each flip flop will transfer whatever logic level is on its input through to its output on the active clock edge, once per clock cycle. The fastest possible clock period is then determined by the longest propagation delay through the logic between the flip flops.
So yes, your statement is essentially correct that each clock cycle is essentially one state change.
In a CPU (or any other synchronous digital logic), the clock serves merely to synchronize the flow of data though the chip such that computations are preformed correctly. The clock is necessary because the propagation delays through the logic can vary, either along different paths or with process, voltage, and temperature (PVT). Incidentally, this is why you see sites like silicon lottery and why CPU manufacturers sell multiple versions of the same chip that run at different speeds--variations in how each chip gets produced mean that the propagation delays of the logic will all be a little different, so some chips will support faster clocks than others.
It is possible to design CPUs that do not use a clock. This is called asynchronous design. It has the advantage of not requiring a clock and all of the complexity and power consumption associated with it, but asynchronous design is more difficult as the circuit must be designed such that no propagation delays are violated, even with variations between chips and across different voltages and temperatures. I have heard that a company made an asynchronous CPU quite a few years ago, and it would slow down if you set a cup of hot coffee on top of it due to the asynchronous logic adjusting to the temperature change.