Electronic – correspondence between cache size and access latency

cachelatencymicroprocessor

Is there is a correspondence between cache sizes and access latency? All other things being equal, does a larger cache operate slower? If so, why? How much slower?

Best Answer

Items in your hands are quicker to access than items in your pockets, which are quicker to access than items in your cupboard, which are quicker to access than items at Digikey. Each successive type of storage I have listed is larger but slower than the previous.

So, let's have the best of both worlds, let's make your hands as big as a Digikey warehouse! No, it doesn't work, because now they aren't really hands any more. They're a cannonball weighing your down.

The reason larger storage is slower to access is distance. Larger storage is further away from you on average. This is true for physical items, and for RAM.

Computer memory takes up physical space. For that reason, larger memories are physically larger, and some locations in that memory are going to be physically further away. Things that are far away take longer to access, due to whatever speed limits there are. In the case of your pockets, and Digikey, the speed limits are the speed of your arms, and the highway speed limits.

In the case of RAM, the speed limits are the propagation speed of electrical signals, propagation delay of gates and drivers, and the common use of synchronous clocks. Even if money was no object, and you could buy as much as you want of the fastest RAM technology available today, you wouldn't be able to benefit from all of it. Lay out an A4 sized sheet of L1 cache if you like, and place your CPU right in the centre. When the CPU wants to access some memory right in the corner of the memory, it'll literally take a nanosecond for the request to get there, and a nanosecond for it to get back. And that's not including all of the propagation delays through and gates and drivers. That's going to seriously slow down your 3GHz CPU.

Since synchronous logic is a lot easier to design than asynchronous logic, one 'block' of RAM will be clocked with the same clock. If want to make the whole memory an L1 cache, then you'd have to clock the whole lot with a slow clock to cope with the worst case timing of the most distant location in memory. This means that distant memory locations are now holding back local ones, which could have been clocked faster. So, the best thing to do would be to zone the memory. The closest and smallest section of the cache would use the fastest clock. The next closest and smallest section would use a slightly slower clock, etc.

And now you have L1 & L2 caches and RAM.

Which brings us to the next reason, power consumption.

The cache actually consumes a significant amount of power. Not only the memory itself, but all the logic surrounding it which handles the mapping between the cache lines and the main memory. Increasing the performance of this extra logic can result in an increase in power consumption. Now, for certain applications (mobile, embedded) you have even more incentive to keep the cache small.

See Cache Design Trade-offs for Power and Performance Optimization: A Case Study (Ching-Long Su and Alvin M. Despain, 1995).