What Are CPU Registers?

cpumemoryregisterterminology

This question has been bothering me for some time now and today I figured I would Google it. I've read some stuff about it and it seemed very similar to what I've always known as processor cache.

Is there a difference between the two or am I right when I think they are the same?
Is a register actually required to be inside a CPU for it to work?

According to Wikipedia a register is a place in the CPU where memory can be quickly accessed and modified before being sent back to the RAM. Did I understand this wrong or are the cache and register actually the same?

Best Answer

They're not quite the same. The registers are the places where the values that the CPU is actually working on are located. The CPU design is such that it is only able to actually modify or otherwise act on a value when it is in a register. So registers can work logic, whereas memory (including cache) can only hold values the CPU reads from and writes to.

Imagine a carpenter at work. He has a few items in his hands (registers) and then, very close by on his workbench (cache) things he is frequently working on, but not using right this moment, and then in the workshop (main memory) things that pertain to the project at hand but that are not immediately important enough to be on the workbench.

EDIT: Here's a simple explanation for how register logic works.

Let's imagine we have four registers named R1..R4. If you compile a statement that looks like this:

x = y + z * 3;

the compiler would output machine code that (when disassembled) looks something like this:

LOAD  R1, ADDRESS_Z //move the value of Z into register 1
MUL   R1, 3         //multiply the value of register 1 by 3
LOAD  R2, ADDRESS_Y //move the value of Y into register 2
ADD   R1, R2        //adds the value in R2 to the value in R1
STORE R1, ADDRESS_X //move the value of register 1 into X

Since most modern CPUs have registers that are either 32 or 64 bits wide, they can do math on any value up to the size they can hold. They don't need special registers for smaller values; they just use special ASM instructions that tell it to only use part of the register. And, much like the carpenter with only two hands, registers can only hold a small amount of data at once, but they can be reused, passing active data in and out of them, which means that "a lot of registers" don't end up being needed. (Having a lot available does allow compilers to generate faster code, of course, but it's not strictly necessary.)

Related Solutions

Memory – How Does the CPU Know When It Received RAM Data?

Yes, there is a Data Acknowledge signal. It asserts that the data has been placed onto the memory bus, and is available to the processor for reading.

Briefly, the memory read cycle works like this:

The processor initiates a read bus cycle by floating the address of the memory location on the address lines.
Once the address lines are stable, the processor asserts the address strobe signal on the bus. The address strobe signals the validity of the address lines.
The processor then sets the Read/Write signal to high, i.e. read.
Now the processor asserts the data strobe signal. This signals to the memory that the processor is ready to read data.
The memory subsystem decodes the address and places the data on the data lines.
The memory subsystem then asserts the data acknowledge signal. This signals to the processor that valid data can now be latched in.
The processor latches in the data and negates the data strobe. This signals to the memory that the data has been latched by the processor. The processor also negates the address strobe signal.
Memory subsystem now negates the data acknowledgement signal. This signals the end of the read bus cycle.

The Data Acknowledge signal is asserted in Step 6.

Processors compensate for the amount of time it takes to read memory by imposing Wait States. However, virtual memory is an operating system function, so the operating system manages the time it takes to read the data off the hard disk and swap it into memory, where the CPU can read it in the usual way.

In simple terms, the CPU simply waits in a loop until the data is available.

CPU Architecture – Accumulator-Based vs Register-Based

A register-based CPU architecture has one or more general purpose registers (where "general purpose register" excludes special purpose registers, like stack pointer and instruction pointer).

An accumulator-based CPU architecture is a register-based CPU architecture that only has one general purpose register (the accumulator).

The main advantage/s of "more that one general purpose register" is that the compiler doesn't have to "spill" as many temporary values onto the stack; and it's easier for the CPU to do more independent instruction in parallel.

For an example imagine you want to do a = (b - c) + (d - f) + 123. For an "apples vs apples comparision" I'll use Intel syntax 32-bit 80x86 assembly for both examples (but only use EAX for the accumulator-based CPU architecture).

For accumulator-based CPU architecture this may be:

    mov eax,[b]     ;Group 1

    sub eax,[c]     ;Group 2

    add eax,123     ;Group 3

    mov [a],eax     ;Group 4
    mov eax,[d]

    sub eax,[e]     ;Group 5

    add [a],eax     ;Group 6

Note that most of these instructions depend on the result from the previous instruction, and therefore can't be done in parallel. The ";Group N" comments are there to indicate which groups of instructions can be done in parallel (and show that, assuming some form of internal "register renaming" ability, "group 4" is the only group where 2 instructions are likely to be done in parallel).

Using multiple registers might give you:

    mov eax,[b]           ;Group 1
    mov ebx,[d]

    sub eax,[c]           ;Group 2
    sub ebx,[e]

    lea eax,[eax+ebx+123] ;Group 3        

    mov [a],eax           ;Group 4

In this case, there's one less instruction, and 2 less groups of instructions (more instructions likely to by done in parallel). That might mean "25% faster" in practice.

Of course in practice code does more than a relatively simple calculation; so there's even more chance of "more instructions in parallel". For example; with only 2 more registers (e.g. ECX and EDX) it should be easy to see that you could do a = (b - c) + (d - f) + 123 and g = (h - i) + (j - k) + 456 in the same amount of time (by doing both calculations in parallel with different registers); and it should also be easy to see that for accumulator-based CPU architecture you can't do the calculations in parallel (two calculations would take twice as long as one calculation).

Note: There is at least one "potential technical inaccuracy" in what I've written here (mostly involving the theoretical capabilities of register renaming and it's application on accumulator-based CPU architectures). This is deliberate. I find that going into too much detail (in an attempt to be "100% technically correct" and cover all the little corner cases) makes it significantly harder for people to understand the relevant parts.

Best Answer

Related Solutions

Memory – How Does the CPU Know When It Received RAM Data?

CPU Architecture – Accumulator-Based vs Register-Based

Related Topic