Categorizing Cache Misses
I was wondering if somebody could provide an example illustrating a capacity miss in contrast to a conflict miss for a 2-way cache with arbitrarily small line size and cache data array size. I was having trouble distinguishing them from each other.
My book tells me the following information on each miss type without providing a good example of each:
Capacity misses occur when the amount of data referenced by a program exceeds the capacity of the cache, requiring that some data be evicted to make room for new data. If the evicted data is referenced again by the program, a cache miss occurs, which is termed a capacity miss.
Conflict misses occur when a program references more lines of data that map to the same set in the cache than the associativity of the cache, forcing the cache to evict one of the lines to make room. If the evicted line is referenced again, the miss that results is a conflict miss.
Capacity and conflict misses both occur because data must be evicted from the cache to make room for new data, but the difference between them is that conflict misses can occur even when there is free space elsewhere in the cache. If a program references multiple lines that map to the same set in the cache, it may be necessary to evict some of them to make room for new data, even if every other set in the cache is empty.
Capacity misses can be reduced by increasing the size of the cache so that more of the data referenced by the program fits in the cache simultaneously.
Conflict misses can be reduced by either increasing associativity, so that more lines that map to the same set can be stored in the cache, or by increasing capacity, which can cause lines that mapped to the same set to map onto different sets.
Assuming you have a 2-way cache, and you receive 3 address read requests that all correspond to different address pages, but these address pages map to the same cache set, then it seem to me that you would exceed the capacity of your cache after the second read because you have already filled both of the lines for the given set for 2-way associativity…and therefore you need to evict one of them to load the 3rd address into the cache? I don't get it… a capacity miss is the same thing as a conflict miss…
or is it, evicting the 2nd line to make way for 3rd line is capacity miss, then rereading first address again after evicting it to make way for third address is a conflict miss?
My Contrived example:
address1, address2, address3, address4 => all map to same "set" of cache
read sequence for 2-way cache.
address1
address2
address3 <- capacity miss (evicting address1 and replacing with address3)
address1 <- conflict miss
address3 <- conflict miss
address4 <- capacity miss (evicting address2 and replacing with address4)
address2 <- conflict miss
address3 <- conflict miss
Am I close or just plain wrong? To me, it just doesn't seem worth keeping track of the difference…
Best Answer
Capacity misses
-- https://en.wikipedia.org/wiki/Cache_performance_measurement_and_metric#Capacity_misses
Assuming you have a 2-way cache, and the cache receives 3 address read requests that all map to the same cache set, then yes, the cache does need to evict one of them to load the 3rd address into the cache. That is (probably) a conflict miss, not a capacity miss.
Assuming you have a cache that holds 512 bytes of data, and the cache receives read requests to repeatedly scan a 600 byte data structure -- no matter how the cache is designed, it will miss at least 88 bytes on every scan -- it has 88 bytes of capacity misses.
You are right that it's not worth inspecting each individual cache miss and deciding whether it's a conflict miss or a capacity miss. But it's still useful to know the total number of conflict misses vs. capacity misses, and there are ways to find that total number without inspecting individual misses.
Example
address1, address2, address3, address4, address5 => all map to same "set" of cache in the 2-way cache
read sequence for 2-way cache (4 total lines, LRU) vs a 4-way cache (4 total lines, fully associative, clairvoyant):
address : 4-way : 2-way
address1 : cold miss : cold miss (1 x)
address2 : cold miss : cold miss (2 1)
address3 : cold miss : cold miss (evicting address1) (3 2)
address4 : cold miss : cold miss (evicting address3) (4 2)
address5 : cold miss : cold miss (evicting address2) (5 4)
address3 : hit : conflict miss (evicting address4) (3 5)
address2 : hit : conflict miss (evicting address5) (2 3)
address1 : hit : conflict miss (evicting 3) (1 2)
address4: capacity miss : capacity miss (evicting 2) (4 1)
details
When building a cache, a human designer needs to decide "How big should we make it?" and "what mapping / associativity should it use -- direct mapped, direct mapped + small victim cache, 2-way associative, skewed associative, 4-way associative, ..., or fully associative?" and "What cache replacement policy should it use?"
Typically the designer runs some simulations (or builds a prototype) and counts how many cache misses occur in a simulated benchmark workload instruction trace.
The designer typically does not inspect each individual cache miss and deciding whether it's a conflict miss or a capacity miss.
We typically make one (unrealistic) simulation with a fully-associative cache using the the (unrealistic) clairvoyant cache-replacement algorithm algorithm -- in that simulation, all the misses are capacity misses or compulsory misses (Mmin).
All other simulations (running exactly the same simulated workload instruction trace and with the same cache capacity and the same cache-line-size) with less-than-fully-associative caches will inevitably have just as many misses. So we subtract the capacity and compulsory misses (Mmin) from their counts and call the "extra" misses caused by lower associativity "conflict misses".
If we line up all the executed instructions in the simulated workload (the instruction trace) and compare where misses occur,
Alas, individual non-cold cache misses after CAP1 are difficult to categorize. They may not even line up -- there may be instructions in the trace that cause a capacity miss in the clairvoyant fully-associative simulation that have a cache hit in some other less-than-fully-associative simulation. (Inevitably that less-than-fully associative simulation will have at least as many cache misses, they've just been displaced to some other instruction(s).).
Fortunately, we don't need to categorize individual misses. We only care about the total number of conflict misses and the total number of capacity misses. And we only care about that in order to decide whether tweaking the mapping is going to make a noticeable improvement in speed, or whether the only way to make a noticeable speedup is to spend a lot of money -- perhaps on a larger cache capacity, or perhaps adding another level of cache, or perhaps adding more CPU cores, or etc.
At a detailed level, the CPU cache doesn't have enough information to categorize which kind of miss occurs. To the CPU cache, all misses (cold cache miss, capacity cache miss, conflict cache miss) look "the same" and the decision-making circuits that figure out whether we have a hit or a miss, and then decide which line of the cache to flush and replace, don't know and work the same with all the different categories of misses.
related: https://en.wikipedia.org/wiki/Cache_performance_measurement_and_metric