Data Structures Performance – Is Storing Data Directly in a List Node Better Than Storing a Pointer to Data?

Architecturedata structureslinked listlistperformance

Suppose we have two different doubly-linked list structures:

  1. One has content of the node embedded directly in the node:

    struct Content {
        // some stuff
    };
    struct Node {
        struct Node *next;
        struct Node *prev;
        struct Content content;
    }
    
  2. The other has a pointer to content:

    struct Content;
    struct Node {
        struct Node *next;
        struct Node *prev;
        struct Content *content;
    }
    

What are considerations to prefer one option to the other? The points can be about everything: performance in different conditions, architecture, ease of modification, ease of manipulating the list, etc.

I'd like to gather knowledge about the trade-offs present.

Shown code is C, but the question also applies to C++, or any other language which has explicit pointers.

Best Answer

Critical Path: Element Traversal

Let's say your most critical loops sequentially iterate through a list and access Content. In that case, typically in descending order of efficiency and ascending order of flexibility:

1. Intrusive C-Style List Node

The highest performance and least amount of flexibility is often going to go to the "classic" C-style intrusive linked list backed by a fixed allocator. The list pointers and contents are inseparably tied together, yielding a very inflexible (but often most compact) solution, like so:

struct ContentNode {
    // The content data is *directly* embedded here (either below or above
    // these pointers for optimal alignment at the cost of minimal padding).
    struct ContentNode *prev;
    struct ContentNode *next;
};

This is only sometimes more efficient than 2 below while always being far less flexible. It only provides a benefit over the second solution if you can reduce the amount of padding by directly embedding the contents into the list node itself.

The level of flexibility lost here is enormous as the notion of element and list node fuse into one inseparable, tightly-bundled concept. And it's also quite dumb to use (in my strong opinion) unless you are first using a fixed allocator to give back a lot of the spatial locality that linked structures generally lack.

Yet the most performance-critical areas of a codebase will often be looking at elements that are aggregated into one primary data structure (possibly with auxiliary data structures pointing to them, but still one main data structure). In that case, your profiling/tuning sessions might have you ultimately reaching for this kind of very-inflexible solution in some very isolated, performance-critical section of your codebase.

2. Non-Intrusive, Contiguous List Node

This would be like your first version, and is dramatically more flexible as Content and Node do not have to be inseparable (Content could be stored in other data structures that own its memory, e.g.).

struct Node {
    struct Node *next;
    struct Node *prev;
    struct Content content;
}

This only costs more than 1 in the rare event that it results in more padding, yet is always more flexible as it decouples Content from the linked list representation.

3. Disjointed List Node

This describes your second solution and tends to be the least efficient as a result of additional cache misses, page faults, and the need for an additional pointer to be stored (more memory usage overall).

struct Node {
    struct Node *next;
    struct Node *prev;
    struct Content *content;
}

Yet this is the easiest way to achieve flexibility, especially in C if you turn content into a void pointer, as you can generalize your linked list very easily this way. Another way to generalize is to use variable-length structs.

Fixed Allocator

In these cases, where performance is a top priority and the critical paths of the list iterate through list elements and access them, it's insane not to use a fixed allocator.

The biggest immediate performance concern of linked lists in cases where the nodes/elements are smallish in size is the loss of contiguity ("disjointed memory") which translates to at least more compulsory cache misses and page faults and possibly even more non-compulsory ones as well.

The fixed allocator is an easy "reach-around" solution when you are inevitably stuck with a linked structure to dramatically reduce cache misses/page faults.

"Efficient"

It's worth noting that the differences between these diminish if the size of the Content data fields are really large. They show up the most if Content is really small (ex: 16 bytes, maybe 64 bytes, not 400 bytes). If Content is huge (hundreds of bytes or more), then the differences start to become increasingly moot as well as the need for a fixed allocator.

Critical Path: List Manipulation

Let's say, instead, that your most critical paths of execution don't even inspect Content data fields and just manipulate list pointers.

In that case, the tables turn. Now 3 becomes the most efficient solution as a result of hot/cold field splitting (Content fields becomes cold data, rarely accessed, and the list pointers become hot data, frequently accessed).

In this case, you still want to use a fixed allocator pooling chunks of data from contiguous pooled memory for the Node (don't have to for Content quite as much here given that it's cold) to reduce page faults and cache misses (at least compulsory ones even if you end up totally rearranging the list).

When combined with the fixed allocator and the hot/cold field splitting, now more nodes can be allocated and accessed without triggering further page faults/cache misses as a result of the smaller size per node (with the cold Content data being stored elsewhere).

All the dynamics here are going to be related to memory efficiency with respect to the memory hierarchy. If you want to get the most out of your linked structures (trees and linked lists), then this is the place to study.