Electronic – stack cache instead of registers

cachememorymicroprocessorregister

Is there a processor that do arithmetic operations on a stack and not on registers? To keep performance, of course, that processor should cache top block of a stack in the same type of memory that is used for registers.

I read in a paper (David R. Ditzel, H.R. McLellan. Register Allocation for Free: The C Machine Stack Cache.) that a cache is slower 2 times than registers because of:

  • indirect addressing during every access to the cache;
  • cache miss when the stack grows.

The paper is old. Maybe, improvements of processor design appeared that makes stack cache viable? I feel that it will reduce complexity of compilers and optimize copying between registers and the rest of memory.

Update 2012-10-18. Because this concept was well-known (not to me), I change the question to “… Modern processors?”

Update 2012-10-18. I feel I must say explicitly that I'm not talking about “zero address machine”. Caching and “zero address” are orthogonal. My hypothetical processor may have even 5-ary addition like “r3 := r0+r2+r11+r5+r8”. “r n” means the memory cell at sp+n, where sp is a stack pointer. sp changes before and after a code block. A very unusual program changes sp at every arithmetic operation.

Best Answer

Yes, the entire line of Burroughs mainframe computers starting in 1961 with the B5000 used a stack architecture.

In this architecture, managing the data flow to and from the stack is not actually too much of a bottleneck for performance. A bigger issue is the fact that a "zero address" machine needs a lot more instructions to complete a given task than a one-, two- or three-address machine does. Instruction decoding and the execution pipeline become the primary bottleneck.

When I worked there in the early 1980s, there was an effort to build a CPU that could prefetch relatively large sequences of zero-address instructions and translate them on the fly to three-address operations that would be fed to the execution pipeline. (Think of a Java JIT compiler implemented in hardware.) It got rather complex, especially for the implementation technologies available at the time, and I don't know whether this strategy ultimately succeeded.

In case you're wondering, the "N-address" terminology refers to the number of operands that can be specified in a single instruction. All operations on a stack machine are implicitly to the top one or two locations on the stack, so there are zero operands in the instructions. A machine that has an accumulator that is used for all operations in conjunction with one other register or memory location is a one-address machine. A two-address machine can specify an arbitrary source and destination operand in one instruction, and a three-address machine can specify two source operands and put the result in an independent destination.