Suppose we have two different doubly-linked list structures:
-
One has content of the node embedded directly in the node:
struct Content { // some stuff }; struct Node { struct Node *next; struct Node *prev; struct Content content; }
-
The other has a pointer to content:
struct Content; struct Node { struct Node *next; struct Node *prev; struct Content *content; }
What are considerations to prefer one option to the other? The points can be about everything: performance in different conditions, architecture, ease of modification, ease of manipulating the list, etc.
I'd like to gather knowledge about the trade-offs present.
Shown code is C
, but the question also applies to C++
, or any other language which has explicit pointers.
Best Answer
Critical Path: Element Traversal
Let's say your most critical loops sequentially iterate through a list and access
Content
. In that case, typically in descending order of efficiency and ascending order of flexibility:1. Intrusive C-Style List Node
The highest performance and least amount of flexibility is often going to go to the "classic" C-style intrusive linked list backed by a fixed allocator. The list pointers and contents are inseparably tied together, yielding a very inflexible (but often most compact) solution, like so:
This is only sometimes more efficient than
2
below while always being far less flexible. It only provides a benefit over the second solution if you can reduce the amount of padding by directly embedding the contents into the list node itself.The level of flexibility lost here is enormous as the notion of element and list node fuse into one inseparable, tightly-bundled concept. And it's also quite dumb to use (in my strong opinion) unless you are first using a fixed allocator to give back a lot of the spatial locality that linked structures generally lack.
Yet the most performance-critical areas of a codebase will often be looking at elements that are aggregated into one primary data structure (possibly with auxiliary data structures pointing to them, but still one main data structure). In that case, your profiling/tuning sessions might have you ultimately reaching for this kind of very-inflexible solution in some very isolated, performance-critical section of your codebase.
2. Non-Intrusive, Contiguous List Node
This would be like your first version, and is dramatically more flexible as
Content
andNode
do not have to be inseparable (Content
could be stored in other data structures that own its memory, e.g.).This only costs more than
1
in the rare event that it results in more padding, yet is always more flexible as it decouplesContent
from the linked list representation.3. Disjointed List Node
This describes your second solution and tends to be the least efficient as a result of additional cache misses, page faults, and the need for an additional pointer to be stored (more memory usage overall).
Yet this is the easiest way to achieve flexibility, especially in C if you turn
content
into a void pointer, as you can generalize your linked list very easily this way. Another way to generalize is to use variable-length structs.Fixed Allocator
In these cases, where performance is a top priority and the critical paths of the list iterate through list elements and access them, it's insane not to use a fixed allocator.
The biggest immediate performance concern of linked lists in cases where the nodes/elements are smallish in size is the loss of contiguity ("disjointed memory") which translates to at least more compulsory cache misses and page faults and possibly even more non-compulsory ones as well.
The fixed allocator is an easy "reach-around" solution when you are inevitably stuck with a linked structure to dramatically reduce cache misses/page faults.
"Efficient"
It's worth noting that the differences between these diminish if the size of the
Content
data fields are really large. They show up the most ifContent
is really small (ex: 16 bytes, maybe 64 bytes, not 400 bytes). IfContent
is huge (hundreds of bytes or more), then the differences start to become increasingly moot as well as the need for a fixed allocator.Critical Path: List Manipulation
Let's say, instead, that your most critical paths of execution don't even inspect
Content
data fields and just manipulate list pointers.In that case, the tables turn. Now
3
becomes the most efficient solution as a result of hot/cold field splitting (Content
fields becomes cold data, rarely accessed, and the list pointers become hot data, frequently accessed).In this case, you still want to use a fixed allocator pooling chunks of data from contiguous pooled memory for the
Node
(don't have to forContent
quite as much here given that it's cold) to reduce page faults and cache misses (at least compulsory ones even if you end up totally rearranging the list).When combined with the fixed allocator and the hot/cold field splitting, now more nodes can be allocated and accessed without triggering further page faults/cache misses as a result of the smaller size per node (with the cold
Content
data being stored elsewhere).All the dynamics here are going to be related to memory efficiency with respect to the memory hierarchy. If you want to get the most out of your linked structures (trees and linked lists), then this is the place to study.