Electronic – Cache. MESI protocol for multilevel cache in Intel processors

cachecomputer-architectureprocessor

Now I'm trying to simulate the performance of Intel CORE 2 Duo processor (but I'll be very pleased with information about any other multi-core Intel processor) and it's work with the computer memory. As I understood, the main problem that exists with memory – is to support the coherence of cache. But how does the protocol MESI work with different layers? For example how does it apply to L2 and L3? I also would be very glad to know enything about non-exclusive write policy implementation and if the replascement algoritm of L2 is connected to the block being replaced from L1? Does Anyone know anything about this?

Best Answer

I guess your question is about how a coherence protocol extends to multi-level caches. This book is a good reference. Here's my understanding:

Ill take the example of Core i7 (as I'm not very familiar with core 2 duo architecture).

In Core i7 every core has a private L1 and L2 cache, and all cores share a single large on-chip L3 cache. One can in-turn join multiple such processors using point-to-point links to form a NUMA system. So there are 4 levels in the memory hierarchy: L1, L2, L3, Main memory.

There is one coherence protocol between multiple L2's on a chip and the L3. There is a separate protocol between multiple L3's on separate chips. The two are independent of each other. One may use snooping, and other may use directory-based implementation. I think in Core i7, both are directory-based MESIF protocols (F is a new Forward state).

All caches in Core i7 are inclusive. This simplifies the protocol somewhat. As L2 is inclusive of L1, a block that is evicted from L2, has to be evicted from L1 too. Similarly, a block evicted from L3 has to be evicted from all L2s. L3 maintains core_valid bits for each block. A core_valid bit is set if the L2 cache of that core has a copy of the block. This way when a block is evicted from L3, the invalidations need to be sent to only those L2s that have a copy of the block. I guess the core_valid bits also act like a kind of directory. If you have inclusion, only the coherence messages for blocks existing in a lower level cache need to be forwarded to the higher level cache. So the lower level cache acts like a snoop filter.

I'm not sure I understand your question about the non-exclusive policy. Maybe this link will help.

Related Topic