Pointer Access and Cache Coherency in Programming

cachingpointers

To my understanding, when you access a variable, that variable and the surrounding area of memory is put into the L1 cache. If I'm wrong here, please tell me.

Now my question is, say I have an array of pointers, and I want to iterate through them all performing operation X. If I first have to access the pointer, to get the address of the actual data, does this mean that the memory near the pointer will be put into cache, then the memory near the data, then back to the pointer, then to the data etc? i.e. Cache thrashing?

If this is the case, how might one go about keep cache coherency?

Best Answer

Whether or not the data will evict the pointer from cache depends on how how much memory you actually touch, how memory addresses are mapped to cache lines, and on the replacement policy. The most common replacement policy is Least Recently Used (sometimes just approximating the least used) where the new data will replace the data which was used the least recently in the cache. Memory addresses are commonly mapped in N-way sets where multiple memory addresses map to the same cache line, and each address can map to N cache lines (see here). N will probably be 2 or 4.

Under these conditions, you will almost assuredly not have cache thrashing if you are dereferencing a pointer then using the data IF the data you touch after dereferencing fits into the cache line. In both memory accesses, the least recently used cache line is definitely not the line with the pointer (in case of the data access), or the data (in case of the pointer dereferencing) since those were the last (most recently) touched cache lines. If you touch more memory than what is in the cache line with the data after dereferencing the pointer then it is possible that you will evict the pointer data from cache, but that depends on where the data is in memory, and how much you touch. It is possible that more data access will not evict the pointers if the addresses do not map to the cache line with the pointers or the pointer line is not the least recently used in the set of lines the addresses map to, but since the memory address of your data allocation is usually not known until the program runs it is hard to say either way just from source code.

It is probably possible to have your data and pointers allocated at specific memory addresses just to keep hitting cache as much as possible, but that would require you know the policies of the cache of the processor running your code, and that the compiler doesn't make optimizations which change the data access patterns without you knowing. It would also be processor specific, with different cache policies on different processors making any micro optimization on this scale worthless. As I note below, it is very difficult to find information about specific policies on common processors so it is probably not worth thinking about this kind of optimization at all. In general, consecutive memory address accesses are the best way of hitting cache if you are only touching things once (like in your loop over the pointers). No other optimization is needed.

This is all assuming a single processor with a single thread. Multiple threads and multiple processors throw wrenches in reasoning about cache. Multiple threads on a single processor means that another thread could start running in the middle of your loop and when your thread starts running again, the cache is in a completely different state and the whole process starts over, but otherwise not much is different.

If you have a multi threaded loop (think one thread per pointer) running on multiple processors, then the whole cache thing is pretty difficult to reason about. The order of execution could be rearranged, or consecutive pointers could be dereferenced on different processors. Both situations means that the cache probably has not be primed by previous pointer dereferences or data accesses. In this case you should consider chunking data so a thread accesses a range of pointers instead of a single one so the cache can actually be used.

Remember that the memory you touch after dereferencing has to be SMALL. A few (read 2 or 4) ints would fit in a cache line, but an object with many members might not. If you access data stored more than a cache line's size from the beginning of the object, you are evicting more from cache and thus have more chance of evicting the pointers.

Finding data about the caches in Intel and AMD processors is kind of difficult, but I found this link from 2010 saying that Intel uses a pseudo LRU policy on a 4-way associative cache for L2 cache, and the same policy for L1 cache (no set associativity is mentioned) on at least one of their processor architectures. It should be noted that the replacement policy is an integral component in making cache speed up programs. I would not be surprised in the least if they are more complicated or harder to reason about than what I presented above since unmodified LRU is not universally the best replacement policy. Actual replacement policies probably have much more complicated rules in the name of keeping cache hits high, but exact details would be scarce since they are a competitive edge in the performance benchmark wars.

Related Solutions

C Programming – Purpose of Arrays When Pointers Could Do the Job

Arrays are contiguous memory created on the stack. You can't guarantee contiguous stack memory without this syntactic sugar, and even if you could, you'd have to allocate a separate pointer in order to be able to do the pointer arithmetic (unless you wanted to do *(&foo + x), which I'm not sure but it might violate l-value semantics, but is at least quite awkward, and would scream out for some kind of syntactic sugar). Design-wise, it also is a form of encapsulation, since you can refer to the collection with a single identifier (which would otherwise require a separate pointer). And even if you could allocate them contiguously and allocated a separate pointer to reference them, you'd have either int fooForSomething, fooForSomethingElse... which forces a fair amount of creativity as your collection grows, so you might think to simplify with int foo1, foo2 ..., which looks just like an array but is harder to maintain.

C Programming – Assigning Strings to Pointers

This is just the way string literals work in C. String literals like "name" are arrays of characters, it is equivalent to the five element array {'n', 'a', 'm', 'e', '\0'}. For the code

char *c;
c="name";

the environment reserves memory for the above array already at initialization time, when the program is loaded from disk into memory. At run time, the adress of the beginning of that array is assigned to c.

Note the first piece of code of yours is not equivalent to the second, because in the first piece you assign a string literal (and not a character like 'n') to a char* variable. In the second, you try to assign an int (and not an int array) to an int*.

Here is a tutorial on strings and pointers in C with a more detailed explanation.

Best Answer

Related Solutions

C Programming – Purpose of Arrays When Pointers Could Do the Job

C Programming – Assigning Strings to Pointers

Related Topic