CPU Architecture – Accumulator-Based vs Register-Based

computer-architecturecpuregisterx86

I don't understand the difference between an accumulator-based CPU architecture and a register-based CPU architecture. I know x86 is register-based but it has an accumulator-like register. I only ever hear people asking the difference between stack-based and register-based, but not register-based and accumulator-based. What are the advantages and disadvantages of each? And can I get some example assembly codes from each, where they differ, as well?

Best Answer

A register-based CPU architecture has one or more general purpose registers (where "general purpose register" excludes special purpose registers, like stack pointer and instruction pointer).

An accumulator-based CPU architecture is a register-based CPU architecture that only has one general purpose register (the accumulator).

The main advantage/s of "more that one general purpose register" is that the compiler doesn't have to "spill" as many temporary values onto the stack; and it's easier for the CPU to do more independent instruction in parallel.

For an example imagine you want to do a = (b - c) + (d - f) + 123. For an "apples vs apples comparision" I'll use Intel syntax 32-bit 80x86 assembly for both examples (but only use EAX for the accumulator-based CPU architecture).

For accumulator-based CPU architecture this may be:

    mov eax,[b]     ;Group 1

    sub eax,[c]     ;Group 2

    add eax,123     ;Group 3

    mov [a],eax     ;Group 4
    mov eax,[d]

    sub eax,[e]     ;Group 5

    add [a],eax     ;Group 6

Note that most of these instructions depend on the result from the previous instruction, and therefore can't be done in parallel. The ";Group N" comments are there to indicate which groups of instructions can be done in parallel (and show that, assuming some form of internal "register renaming" ability, "group 4" is the only group where 2 instructions are likely to be done in parallel).

Using multiple registers might give you:

    mov eax,[b]           ;Group 1
    mov ebx,[d]

    sub eax,[c]           ;Group 2
    sub ebx,[e]

    lea eax,[eax+ebx+123] ;Group 3        

    mov [a],eax           ;Group 4

In this case, there's one less instruction, and 2 less groups of instructions (more instructions likely to by done in parallel). That might mean "25% faster" in practice.

Of course in practice code does more than a relatively simple calculation; so there's even more chance of "more instructions in parallel". For example; with only 2 more registers (e.g. ECX and EDX) it should be easy to see that you could do a = (b - c) + (d - f) + 123 and g = (h - i) + (j - k) + 456 in the same amount of time (by doing both calculations in parallel with different registers); and it should also be easy to see that for accumulator-based CPU architecture you can't do the calculations in parallel (two calculations would take twice as long as one calculation).

Note: There is at least one "potential technical inaccuracy" in what I've written here (mostly involving the theoretical capabilities of register renaming and it's application on accumulator-based CPU architectures). This is deliberate. I find that going into too much detail (in an attempt to be "100% technically correct" and cover all the little corner cases) makes it significantly harder for people to understand the relevant parts.