Electronic – Cache miss types: capacity miss vs. conflict miss

cachecomputer-architecture

Categorizing Cache Misses

I was wondering if somebody could provide an example illustrating a capacity miss in contrast to a conflict miss for a 2-way cache with arbitrarily small line size and cache data array size. I was having trouble distinguishing them from each other.

My book tells me the following information on each miss type without providing a good example of each:

Capacity misses occur when the amount of data referenced by a program exceeds the capacity of the cache, requiring that some data be evicted to make room for new data. If the evicted data is referenced again by the program, a cache miss occurs, which is termed a capacity miss.

Conflict misses occur when a program references more lines of data that map to the same set in the cache than the associativity of the cache, forcing the cache to evict one of the lines to make room. If the evicted line is referenced again, the miss that results is a conflict miss.

Capacity and conflict misses both occur because data must be evicted from the cache to make room for new data, but the difference between them is that conflict misses can occur even when there is free space elsewhere in the cache. If a program references multiple lines that map to the same set in the cache, it may be necessary to evict some of them to make room for new data, even if every other set in the cache is empty.

Capacity misses can be reduced by increasing the size of the cache so that more of the data referenced by the program fits in the cache simultaneously.

Conflict misses can be reduced by either increasing associativity, so that more lines that map to the same set can be stored in the cache, or by increasing capacity, which can cause lines that mapped to the same set to map onto different sets.

Assuming you have a 2-way cache, and you receive 3 address read requests that all correspond to different address pages, but these address pages map to the same cache set, then it seem to me that you would exceed the capacity of your cache after the second read because you have already filled both of the lines for the given set for 2-way associativity…and therefore you need to evict one of them to load the 3rd address into the cache? I don't get it… a capacity miss is the same thing as a conflict miss…

or is it, evicting the 2nd line to make way for 3rd line is capacity miss, then rereading first address again after evicting it to make way for third address is a conflict miss?

My Contrived example:

address1, address2, address3, address4 => all map to same "set" of cache

read sequence for 2-way cache.

address1

address2

address3 <- capacity miss (evicting address1 and replacing with address3)

address1 <- conflict miss

address3 <- conflict miss

address4 <- capacity miss (evicting address2 and replacing with address4)

address2 <- conflict miss

address3 <- conflict miss

Am I close or just plain wrong? To me, it just doesn't seem worth keeping track of the difference…

Best Answer

Capacity misses

A capacity miss occurs due to the limited size of a cache and not the cache's mapping function. When the working set, i.e. the data that is currently important to the program, is bigger than the cache, capacity misses will occur frequently. Out of the 3Cs capacity misses are the hardest to identify and can be thought of as non-compulsory misses in a fully associative cache. In a single processor system, the misses that exist after subtracting the number of compulsory misses and conflict misses can be categorized as capacity misses.

Since capacity misses can be attributed to the limited size of a cache, a simple way to reduce the number of such misses is to increase the cache size.

-- https://en.wikipedia.org/wiki/Cache_performance_measurement_and_metric#Capacity_misses

Assuming you have a 2-way cache, and the cache receives 3 address read requests that all map to the same cache set, then yes, the cache does need to evict one of them to load the 3rd address into the cache. That is (probably) a conflict miss, not a capacity miss.

Assuming you have a cache that holds 512 bytes of data, and the cache receives read requests to repeatedly scan a 600 byte data structure -- no matter how the cache is designed, it will miss at least 88 bytes on every scan -- it has 88 bytes of capacity misses.

You are right that it's not worth inspecting each individual cache miss and deciding whether it's a conflict miss or a capacity miss. But it's still useful to know the total number of conflict misses vs. capacity misses, and there are ways to find that total number without inspecting individual misses.

Example

address1, address2, address3, address4, address5 => all map to same "set" of cache in the 2-way cache

read sequence for 2-way cache (4 total lines, LRU) vs a 4-way cache (4 total lines, fully associative, clairvoyant):

address : 4-way : 2-way

address1 : cold miss : cold miss (1 x)

address2 : cold miss : cold miss (2 1)

address3 : cold miss : cold miss (evicting address1) (3 2)

address4 : cold miss : cold miss (evicting address3) (4 2)

address5 : cold miss : cold miss (evicting address2) (5 4)

address3 : hit : conflict miss (evicting address4) (3 5)

address2 : hit : conflict miss (evicting address5) (2 3)

address1 : hit : conflict miss (evicting 3) (1 2)

address4: capacity miss : capacity miss (evicting 2) (4 1)

details

When building a cache, a human designer needs to decide "How big should we make it?" and "what mapping / associativity should it use -- direct mapped, direct mapped + small victim cache, 2-way associative, skewed associative, 4-way associative, ..., or fully associative?" and "What cache replacement policy should it use?"

Typically the designer runs some simulations (or builds a prototype) and counts how many cache misses occur in a simulated benchmark workload instruction trace.

The designer typically does not inspect each individual cache miss and deciding whether it's a conflict miss or a capacity miss.

We typically make one (unrealistic) simulation with a fully-associative cache using the the (unrealistic) clairvoyant cache-replacement algorithm algorithm -- in that simulation, all the misses are capacity misses or compulsory misses (Mmin).

All other simulations (running exactly the same simulated workload instruction trace and with the same cache capacity and the same cache-line-size) with less-than-fully-associative caches will inevitably have just as many misses. So we subtract the capacity and compulsory misses (Mmin) from their counts and call the "extra" misses caused by lower associativity "conflict misses".

If we line up all the executed instructions in the simulated workload (the instruction trace) and compare where misses occur,

  • The first time we read any of the bytes in a cache-line-size chunk of main memory, we have a cold miss (compulsory miss) on that exact instruction in every simulation
  • In the fully associative cache using the the clairvoyant cache-replacement algorithm algorithm, all the remaining misses (non-cold cache misses) are capacity misses -- in particular, the first capacity miss of the fully-associative simulation ( CAP1 ).
  • Every non-cold cache miss that occurs before CAP1 is a conflict miss.

Alas, individual non-cold cache misses after CAP1 are difficult to categorize. They may not even line up -- there may be instructions in the trace that cause a capacity miss in the clairvoyant fully-associative simulation that have a cache hit in some other less-than-fully-associative simulation. (Inevitably that less-than-fully associative simulation will have at least as many cache misses, they've just been displaced to some other instruction(s).).

Fortunately, we don't need to categorize individual misses. We only care about the total number of conflict misses and the total number of capacity misses. And we only care about that in order to decide whether tweaking the mapping is going to make a noticeable improvement in speed, or whether the only way to make a noticeable speedup is to spend a lot of money -- perhaps on a larger cache capacity, or perhaps adding another level of cache, or perhaps adding more CPU cores, or etc.

At a detailed level, the CPU cache doesn't have enough information to categorize which kind of miss occurs. To the CPU cache, all misses (cold cache miss, capacity cache miss, conflict cache miss) look "the same" and the decision-making circuits that figure out whether we have a hit or a miss, and then decide which line of the cache to flush and replace, don't know and work the same with all the different categories of misses.

related: https://en.wikipedia.org/wiki/Cache_performance_measurement_and_metric

Related Topic