CPU Architecture – Accumulator-Based vs Register-Based

computer-architecturecpuregisterx86

I don't understand the difference between an accumulator-based CPU architecture and a register-based CPU architecture. I know x86 is register-based but it has an accumulator-like register. I only ever hear people asking the difference between stack-based and register-based, but not register-based and accumulator-based. What are the advantages and disadvantages of each? And can I get some example assembly codes from each, where they differ, as well?

Best Answer

A register-based CPU architecture has one or more general purpose registers (where "general purpose register" excludes special purpose registers, like stack pointer and instruction pointer).

An accumulator-based CPU architecture is a register-based CPU architecture that only has one general purpose register (the accumulator).

The main advantage/s of "more that one general purpose register" is that the compiler doesn't have to "spill" as many temporary values onto the stack; and it's easier for the CPU to do more independent instruction in parallel.

For an example imagine you want to do a = (b - c) + (d - f) + 123. For an "apples vs apples comparision" I'll use Intel syntax 32-bit 80x86 assembly for both examples (but only use EAX for the accumulator-based CPU architecture).

For accumulator-based CPU architecture this may be:

    mov eax,[b]     ;Group 1

    sub eax,[c]     ;Group 2

    add eax,123     ;Group 3

    mov [a],eax     ;Group 4
    mov eax,[d]

    sub eax,[e]     ;Group 5

    add [a],eax     ;Group 6

Note that most of these instructions depend on the result from the previous instruction, and therefore can't be done in parallel. The ";Group N" comments are there to indicate which groups of instructions can be done in parallel (and show that, assuming some form of internal "register renaming" ability, "group 4" is the only group where 2 instructions are likely to be done in parallel).

Using multiple registers might give you:

    mov eax,[b]           ;Group 1
    mov ebx,[d]

    sub eax,[c]           ;Group 2
    sub ebx,[e]

    lea eax,[eax+ebx+123] ;Group 3        

    mov [a],eax           ;Group 4

In this case, there's one less instruction, and 2 less groups of instructions (more instructions likely to by done in parallel). That might mean "25% faster" in practice.

Of course in practice code does more than a relatively simple calculation; so there's even more chance of "more instructions in parallel". For example; with only 2 more registers (e.g. ECX and EDX) it should be easy to see that you could do a = (b - c) + (d - f) + 123 and g = (h - i) + (j - k) + 456 in the same amount of time (by doing both calculations in parallel with different registers); and it should also be easy to see that for accumulator-based CPU architecture you can't do the calculations in parallel (two calculations would take twice as long as one calculation).

Note: There is at least one "potential technical inaccuracy" in what I've written here (mostly involving the theoretical capabilities of register renaming and it's application on accumulator-based CPU architectures). This is deliberate. I find that going into too much detail (in an attempt to be "100% technically correct" and cover all the little corner cases) makes it significantly harder for people to understand the relevant parts.

Related Solutions

What Are CPU Registers?

They're not quite the same. The registers are the places where the values that the CPU is actually working on are located. The CPU design is such that it is only able to actually modify or otherwise act on a value when it is in a register. So registers can work logic, whereas memory (including cache) can only hold values the CPU reads from and writes to.

Imagine a carpenter at work. He has a few items in his hands (registers) and then, very close by on his workbench (cache) things he is frequently working on, but not using right this moment, and then in the workshop (main memory) things that pertain to the project at hand but that are not immediately important enough to be on the workbench.

EDIT: Here's a simple explanation for how register logic works.

Let's imagine we have four registers named R1..R4. If you compile a statement that looks like this:

x = y + z * 3;

the compiler would output machine code that (when disassembled) looks something like this:

LOAD  R1, ADDRESS_Z //move the value of Z into register 1
MUL   R1, 3         //multiply the value of register 1 by 3
LOAD  R2, ADDRESS_Y //move the value of Y into register 2
ADD   R1, R2        //adds the value in R2 to the value in R1
STORE R1, ADDRESS_X //move the value of register 1 into X

Since most modern CPUs have registers that are either 32 or 64 bits wide, they can do math on any value up to the size they can hold. They don't need special registers for smaller values; they just use special ASM instructions that tell it to only use part of the register. And, much like the carpenter with only two hands, registers can only hold a small amount of data at once, but they can be reused, passing active data in and out of them, which means that "a lot of registers" don't end up being needed. (Having a lot available does allow compilers to generate faster code, of course, but it's not strictly necessary.)

Developing a Compiler for Custom CPU Architecture

LLVM backend is the primary sane way of doing this. If you lower LLVM IR to assembly or microcode, you can roll from there and just use the numerous LLVM frontends to convert higher languages like C++ into LLVM IR.

In other words, LLVM was explicitly designed to support this scenario.

The full stack goes like this:

Frontend (e.g. Clang for C and C++) - source code -> LLVM IR
Optimizer (LLVM) - LLVM IR -> LLVM IR
Backend (you) - LLVM IR -> assembly/microcode/whatever

The first part is provided for you on a per-language basis. So for C and C++, you can use Clang, for D you can use LDC, etc. The second part is provided by LLVM- they provide a large number of target-independent optimization routines and some target-aware ones. Finally, you provide a translation service from LLVM IR to your architecture-specific code.

Note that LLVM IR makes a few guarantees, because they are targetted at real platforms. For example, they assume IEEE754 floating-point support and 8-bit bytes as well as various types of pointer support. You will need to support all of these anyway if you want to compile languages like C to target your architecture in general. If you are willing to restrict the source language a bit beyond normal you can get away without implementing all of these features- for example, if the C code doesn't use floats, in principle there's no reason why the frontend should emit float-using LLVM IR code.

LLVM IR is a common middle-ground that you can compile any language to target, and then from there, can be lowered for any CPU. Basically, all you need to do is support the primitives, and then provide an LLVM backend to convert from LLVM IR to your assembly. LLVM and language frontends will do all the rest.

Best Answer

Related Solutions

What Are CPU Registers?

Developing a Compiler for Custom CPU Architecture

Related Topic