Electronic – What prevents the construction of a CPU with all necessary memory represented in registers?

cpumemoryregister

Modern CPUs employ a hierarchy of memory technologies. Registers, built into the chip have the lowest access times, but are expensive and volatile. Cache is a middle-man between RAM and registers to store data structures to reduce latency between RAM and registers. RAM holds, for the scope of this query, active program code and their data structures. Non-volatile storage is used by programs to save their data and hold the OS and its programs.

The latency of accessing data in memory has been a major bottleneck to creating faster CPUs that do not sit idle, awaiting further instruction. As such, various methods have been designed to parallelize workloads, CPUs to predict branching to hide memory access overhead, and more. However, the complexity of this has seemingly ignored another possibility: a whole-memory register file.

Such a CPU is built with 4, 8, 16, 32 GB or more, made of registers. No cache. No RAM. Just the CPU, the registers on the chip, and external non-volatile storage (SSD/Flash, HDD, etc.).

I understand that the demand for such a chip is unlikely to be sufficient to justify the cost, but I remain surprised that no one seems to have designed a simple device, such as a high-performance MCU or SoC with a small amount of register-only memory. Are there other (perhaps, engineering) challenges to the design and construction of such a chip?

EDIT to Clarify. I am not referring to a CPU in which all memory (DRAM technology) is integrated onto the CPU die, nor am I referring to a cache that is expanded to multiple Gigabytes. I am asking about a design in which the registers remain their existing technology… just expanded by a few orders of magnitude to be able to hold multiple gigabytes of data.

Best Answer

Two factors work against your idea:

  • the optimal chip production processes for (D)RAM and logic (CPU) are different. Combining both on the same chip leads to compromises, and the result is far less optimal than what can be achieved with separate chips, each built with their own optimal process.

  • fast memory (registers) takes more die area and consumes more current (energy) than slow memory. Consequently, when the CPU die is filled with really fast memory (CPU speed), the size of that memory would be nowhere near the GB's you mention. It would be more like the current size of the fastest on-chip caches.