Why can I manually manage the GPU cache, but not the CPU cache

cachingmemory

On the GPU, each thread has access to "shared" or "local" memory, which is analogous to cache on the CPU. So instead of just caching the most recent page, I can tell my program which pieces of memory will be accessed most frequently and manually keep those in cache. My question is: why do CPU designers not allow an analogous operation? I.e. why can't I say to the CPU, "OK, the nodes of this tree aren't on the same page in memory, but I need to access them a lot, so malloc them in the cache for me"?

Best Answer

This kind of memory management: telling the cpu (in advance) what content is frequently accessed, is really hard to do for a wide array of programming problems, where data structures involve pointers and such.

Yet it is (by comparison) easier to do for certain parallel sliced algorithms, such as found in the graphics domain. In the graphics domain, you're dealing with large chunks of contiguous (numeric) data and vastly fewer pointers.

So, modern CPUs opt to do their cache management automatically, using multi-level caches that ultimately ends with disc-based memory. Each level of the cache is noticing how often some cached portion of memory is used, and uses that information when it decides to evict something from that level of the cache. Each level has a different "page" size (called line size on the upper levels).

So, there's virtually no way for a programmer to inform the CPU of what to keep and what to evict, because of the combination of multi-level and varying cache line/page sizes at each level. Ok, so that's bad enough, but now, throw in that the same program wants to run on multiple different cpus of different performance (where much of that performance difference comes by increasing cache sizes, number of cache levels, etc..), and, then this becomes an intractable problem for the programmer dealing with general purpose algorithms and data structures.

What programmer can do, then, instead of informing the cpu what to keep/evict, is attempt to co-locate related items (e.g. A and B) so that across all the possible variations of cpus and multi-level caches, if A is in the cache, then so is B. (There are other things programmers can attempt to keep programs cache friendly, you can google "cache friendly" data structure or algorithms.)

Another difference is that the GPU memory is separated from the CPU memory, so programming the GPU necessarily involves moving memory back and forth. Whereas the CPU has cache misses and page faults that automatically load memory that is not close to the CPU, the GPU (historically) doesn't have these mechanisms and the programmers of the GPU have to constantly instruct the GPU to copy memory back and forth between GPU memory and CPU memory. This has been and is ever increasingly a problem as we use GPUs for more problem solving, so eventually we'll see more and more hardware breaking down the barrier between CPU memory and GPU memory, resulting in unification at higher levels of the cache hierarchy.

Related Topic