Electronic – How do computers execute instructions spanning multiple clock cycles

clockinstruction-set

There are some(or most) instructions in a computer that simply cannot be executed in a single clock cycle. But there lies a problem. How does the program counter in the computer know when an instruction is completed, considering that one instruction might take 2 clock cycles vs another being 3 clock cycles. How do program counters(or more importantly, the machinery that detects that an instruction is complete) work?

Best Answer

tl;dr version: instructions are expanded internally to multiple microcoded steps. The PC is held by microcode until the instruction sequence completes.

get-a-cuppa version:

A CPU has two types of code: instructions (opcodes + operands) that are fetched and executed from RAM, and the small steps that carry out that instruction, called microcode, that implement each sub-operation necessary for the instruction.

For example, an instruction that adds two numbers from memory and stores them back to memory would break down in steps like this:

  • fetch value A from RAM and load in register A
  • fetch value B from RAM and load in register B
  • add registers A and B, store in register C
  • store register C into RAM

So here we have four defined microinstructions (microcode sequence steps) to complete one opcode, expanded out from just one instruction fetched from RAM. These microinstructions are decoded, one by one, to steer the data from external memory or I/O, through the CPU, and finally back to memory or I/O.

Thus, there are two ‘program counters’: the system PC (Program Counter) that provides the external instruction address, and a microinstruction sequence count that selects the current microinstruction step. The PC advances on each instruction fetch, while the microinstruction sequence count advances faster, on each CPU clock. Greatly simplifying, the PC advances only when the microinstruction sequence is finished.

So in the above example, the PC is held for four clocks waiting for the microcode to complete, then advances to the next count and the next instruction is fetched. In reality, to save time the next fetch can be - and usually is - overlapped (pipelined) with the current instruction processing. This gets complicated with instructions that can lead to a branch, that is, a jump that isn't to the next code address. Anyway, instruction prefetch is a rabbit hole for another time.

What does it all mean?

Clock cycle count per instruction is a huge deal. It is a subject of hot debate and a long-standing competitive rivalry. Why? Without getting into too much detail, the two main approaches to computer architecture these days are:

  • CISC, for Complex Instruction Set Computer and
  • RISC, for Reduced Instruction Set Computer

Broadly speaking, CISC machines (like x86) have many complex, powerful instructions that are code-dense, but take many microcode cycles to execute. RISC machines (like ARM) have simpler (and fewer) instructions that take fewer cycles to complete, but are less code-dense (programs are bigger).

It's been nearly 40 years since RISC came about, and decades longer than that for microcoded CISC (going back to the late 1940s.) In those 40 years alongside each other as competitors for mind- and market share, there has been a lot of cross-pollination between RISC and CISC, hastened along as VLSI technology has become faster and more dense. So CISC hardware got more powerful and cycle-efficient, while RISC hardware increased in complexity while keeping to the small-is-beautiful idea of a simpler opcode set.