What happens on a Cache miss

computer-architecturemicroprocessoroperating system

In the present day processors more than one level of memory is present for trying for the realization of an ideal memory system and to do more work for clock cycle more than one instruction is in the pipeline of execution by exploiting the ILP (Instruction Level Parallelism).

My question is, upon a cache miss, what happens? i.e., whether the instructions prior to and instructions after the instruction causing the cache miss are stalled or only the instructions after the instruction causing cache miss are stalled?

I know that the cases may arise depending on whether the processor is having speculative and out-of-order execution and if the processor also have the ability to exploit the MLP (Memory Level Parallelism).

I want to know about the cases of processor having MLP and not having it

I was not able to find helpful information.

Best Answer

Instructions before (in program order) the data cache miss will flow down the pipeline as normal. (An unusual exception would be a push-based pipeline as used by some early VLIWs. Such required subsequent operations to push previous operations down the pipelines.)

For a cache miss on a store, the stored value can be placed into a buffer allowing the store to complete despite the cache miss. (This is possible because the buffer does not require any data from memory, typically accomplished by having a valid bit for each storable unit [typically byte].)

Many processors using in-order execution allow instructions after a load to execute and complete—even another load—if the following instructions are not data dependent on the load (or, of course, after an instruction that is data dependent). This can be accomplished by the use of a scoreboard marking the availability of each register.

For an out-of-order processor, instructions after an instruction that is dependent on the cache missing load instruction can be completely executed and results stored in rename registers (or a store queue for stores to memory) but not committed.

Control flow instructions like branches and indirect jumps are special in that following instructions are dependent on the result, but often prediction can be effectively used to hide this dependency. Although value prediction for load misses has been studied, the benefit is relatively limited given the cost.

In theory it would sometimes also be possible to speculatively partially execute dependent instructions. E.g.:

  lw r3, [r5]; // load word
  add r3, r3, #50; // r3 = r3 + 50
  slt r6, r3, #1000; // (r3<1000)?r6=1:r6=0
  bez r6 LABEL; // if r6=0 goto LABEL
  addi r3, r3, #10; // r3 = r3 + 10
LABEL:

Theoretically, hardware could speculate that the branch is not taken and add 50 and 10 so that 60 would be added to the value when it becomes available. This kind of optimization has been proposed for (instruction) trace caches.

Some instructions may also be broken into component operations which are not dependent on the not-yet-available value to allow partial execution of the instruction. E.g., division using a Newton-Raphson mechanism can generate the reciprocal while the dividend is unavailable.