I think it depends on the machine, its design aims and the limitations of the technology.
Most traditional accumulator machines have very few registers, and many only have a single ‘general-purpose’ register (by today's standards). On the PDP-8, for example, there are a few registers, but the only one you can access directly is the AC. The vast majority of instructions (not that ‘vast’ applies to a design with eight instructions) operate on the AC. The only other object to operate on is memory, so most instructions have a single field (a 7-bit address). The data transfer direction is implicit in the instruction (think load/store).
On the other hand, the machine languages of many accumulator machines were a side effect of the system architecture more so than the other way round (which is how we do it today, software being king). The PDP-8 was an accumulator machine because, well, 12 bits worth of flip-flops were pretty expensive equipment in the late 60s and the PDP-8 was a cheap computer.
As for the speed factor: in-processor registers were almost always faster than memory, same as now. But their cost was much higher than it is now, instruction widths were smaller, sequencers and state machines were much simpler, and so there weren't many registers. Unless you have at least two of them, you can keep your instruction set very simple. And vice versa.
Also, with respect to early computers using ‘accumulators’ instead of ‘registers’: ‘accumulator’ came from its use in tabulating machines to denote registers for running totals. Presumably operators would have been more comfortable with the term. The PDP-8 schematics refer to every single flip-flop in the system as a ‘register’, but there's only one ‘accumulator’ and it's marked as a ‘major’ register.
registers
I believe you're referring to this drawing:
The register file is an array of 32 general purpose registers which resides in the AVR's CPU. The registers are tightly coupled with the ALU (Arithmetic and Logic Unit) so that operations on the register data can be performed much faster than data which has to be fetched from (external) RAM. A microcontroller will have special addressing modes for working with the registers. If you want to add 5 to a variable in RAM the instruction will have to include the RAM address, for which it may need 16 bit. Since the register file is only 32 bytes long you only need 5 bit to address a register, and the instruction set is designed in such a way that these 5 bit will fit together with the instruction's opcode in a 16-bit instruction. So no second fetch for the address is required.
That's what the drawing shows: the instruction is 16 bit wide, and 5 bit of that is the register designator. They even went a step further. There are instructions which operate on two registers, and again they constructed the instructions in such a way that the two register designators fit in the 16-bit instruction, so again only 1 fetch is required.
So it doesn't show two registers combined into one of double length, although there are microcontrollers which allow this.
flags
Flags are single bit data which indicate some status: on or off, true of false.
This is the programming model of the good old MC6809 (a beautiful microprocessor, which more than 30 years ago was way ahead of its time). The condition code register consists of 8 flags, with names like "zero", "carry", "overflow", "IRQ mask", etc. You wouldn't access these bits directly like another register, but use them for conditional branch instructions, like BEQ
, for "branch if equal", which will branch if the "zero" flag is set.
But programmers needed much more flags in their programs. So they assigned bytes for the task, and they would address individual bits by masking the others by AND-ing the byte with a bit mask.
bit fields
This was a bit clumsy, and especially with higher level language something more advances was needed. C lets you define bit fields which you can address directly in your code:
struct packed_struct {
unsigned int f1:1;
unsigned int f2:1;
unsigned int f3:1;
unsigned int f4:1;
unsigned int type:4;
unsigned int funny_int:9;
} pack;
Note that you can't just define single bit fields, but also groups of several bits, like 4 bit and 9 bit in the example.
bit banding
Internally the C code will still be compiled into code that uses the masks to get at 1 particular bit, so it's still not very efficient. The ARM Cortex-M3 has a solution for this. It has a 4 GB addressing space, most of which isn't used. So they use part of it for what's called bit banding.
It means that parts of memory and I/O have copies at 1 bit per address elsewhere in memory. The 1MB SRAM bit band region holds 32 Mb, and each of these is mapped to a 32 MB address in the bit banding block. So masking is no longer required.
Further reading
LPC17xx User Manual at NXP.
Best Answer
That quote you put up refers to the perspectve of a complex calculation that uses various variables and constants. The final result will end up where the specification (for instance a high-level language statement) specifies that it should. Take for instance
It is clear that the final result must end up in A, because the statement says so. But most CPU's can't do this calculation without first calculating ( 3 * B ). So where should that value end up? That depends on the type of CPU architecture. In an accumulator architecture the statement would be translated to something like
The accumulator (which is not explicitly mentioned in the instructions, because in an accumulator architecture it is the only option) is used to hold the intermidiate (and final!) result of the calculation.
These days acculmulator architectures are out of fashion because the speed difference between CPU and RAM would make them very slow. The dominant architectures are now register-register, where the destination of each calculation can be choosen from a set of registers. Typically these are also load-store architectures, where an operation can either access RAM data, or do a calculation, but not both. For such an architecture the statement could translate to