Modern CPUs employ a hierarchy of memory technologies. Registers, built into the chip have the lowest access times, but are expensive and volatile. Cache is a middle-man between RAM and registers to store data structures to reduce latency between RAM and registers. RAM holds, for the scope of this query, active program code and their data structures. Non-volatile storage is used by programs to save their data and hold the OS and its programs.
The latency of accessing data in memory has been a major bottleneck to creating faster CPUs that do not sit idle, awaiting further instruction. As such, various methods have been designed to parallelize workloads, CPUs to predict branching to hide memory access overhead, and more. However, the complexity of this has seemingly ignored another possibility: a whole-memory register file.
Such a CPU is built with 4, 8, 16, 32 GB or more, made of registers. No cache. No RAM. Just the CPU, the registers on the chip, and external non-volatile storage (SSD/Flash, HDD, etc.).
I understand that the demand for such a chip is unlikely to be sufficient to justify the cost, but I remain surprised that no one seems to have designed a simple device, such as a high-performance MCU or SoC with a small amount of register-only memory. Are there other (perhaps, engineering) challenges to the design and construction of such a chip?
EDIT to Clarify. I am not referring to a CPU in which all memory (DRAM technology) is integrated onto the CPU die, nor am I referring to a cache that is expanded to multiple Gigabytes. I am asking about a design in which the registers remain their existing technology… just expanded by a few orders of magnitude to be able to hold multiple gigabytes of data.