C++ – Data Oriented Design – impractical with more than 1-2 structure “members”

cdata structuresoptimizationperformance

The usual example of Data Oriented Design is with the Ball structure:

struct Ball
{
  float Radius;
  float XYZ[3];
};

and then they make some algorithm that iterates a std::vector<Ball> vector.

Then they give you the same thing, but implemented in Data Oriented Design:

struct Balls
{
  std::vector<float> Radiuses;
  std::vector<XYZ[3]> XYZs;
};

Which is good and all if you're going to iterate trough all radiuses first, then all positions and so on. However, how do you move the balls in the vector? In the original version, if you have a std::vector<Ball> BallsAll, you can just move any BallsAll[x] to any BallsAll[y].

However to do that for the Data Oriented version, you must do the same thing for every property (2 times in the case of Ball – radius and position). But it gets worse if you have a lot more properties. You'll have to keep an index for each "ball" and when you try to move it around, you have to do the move in every vector of properties.

Doesn't that kill any performance benefit of Data Oriented Design?

Best Answer

Data-Oriented Mindset

Data-oriented design doesn't mean apply SoAs everywhere. It simply means designing architectures with a predominant focus on data representation -- specifically with a focus on efficient memory layout and memory access.

That could possibly lead to SoA reps when appropriate like so:

struct BallSoa
{
   vector<float> x;        // size n
   vector<float> y;        // size n
   vector<float> z;        // size n
   vector<float> r;        // size n
};

... this is often suitable for vertical loopy logic that doesn't process a sphere center vector components and radius simultaneously (the four fields aren't simultaneously hot), but instead one at a time (a loop through radius, another 3 loops through individual components of sphere centers).

In other cases it might be more appropriate to use an AoS if the fields are frequently accessed together (if your loopy logic is iterating through all the fields of balls rather than individually) and/or if random-access of a ball is needed:

struct BallAoS
{
    float x;
    float y;
    float z;
    float r;
};
vector<BallAoS> balls;        // size n

... in other cases it might be suitable to use a hybrid which balances both benefits:

struct BallAoSoA
{
    float x[8];
    float y[8];
    float z[8];
    float r[8];
};
vector<BallAoSoA> balls;      // size n/8

... you might even compress the size of a ball to half using half-floats to fit more ball fields into a cache line/page.

struct BallAoSoA16
{
    Float16 x2[16];
    Float16 y2[16];
    Float16 z2[16];
    Float16 r2[16];
};
vector<BallAoSoA16> balls;    // size n/16

... perhaps even the radius is not accessed nearly as often as the sphere center (perhaps your codebase often treats them like points and only rarely as spheres, e.g.). In that case, you might apply a hot/cold field splitting technique further.

struct BallAoSoA16Hot
{
    Float16 x2[16];
    Float16 y2[16];
    Float16 z2[16];
};
vector<BallAoSoA16Hot> balls;     // size n/16: hot fields
vector<Float16> ball_radiuses;    // size n: cold fields

The key to a data-oriented design is to consider all of these kinds of representations early in making your design decisions, to not trap yourself into a sub-optimal representation with a public interface behind it.

It puts a spotlight on memory access patterns and accompanying layouts, making them a significantly stronger concern than usual. In a sense it may even somewhat tear down abstractions. I've found by applying this mindset more that I no longer look at std::deque, e.g., in terms of its algorithmic requirements quite as much as the aggregated contiguous blocks representation it has and how random-access of it works at the memory level. It is somewhat putting a focus on implementation details, but implementation details which tend to have just as much or more of an impact on performance as the algorithmic complexity describing scalability.

Premature Optimization

A lot of the predominant focus of data-oriented design will appear, at least at a glance, as dangerously close to premature optimization. Experience often teaches us that such micro-optimizations are best applied in hindsight, and with a profiler in hand.

Yet perhaps a strong message to take from data-oriented design is to leave room for such optimizations. That's what a data-oriented mindset can help allow:

Data-oriented design can leave breathing room to explore more effective representations. It's not necessarily about achieving memory layout perfection in one go, but more about making the appropriate considerations in advance to allow increasingly-optimal representations.

Granular Object-Oriented Design

A lot of data-oriented design discussions will pit themselves against classical notions of object-oriented programming. Yet I would offer a way of looking at this which is not quite as hardcore as to dismiss OOP entirely.

The difficulty with object-oriented design is that it will often tempt us to model interfaces at a very granular level, leaving us trapped with a scalar, one-at-a-time mindset instead of a parallel bulk mindset.

As an exaggerated example, imagine an object-oriented design mindset applied to a single pixel of an image.

class Pixel
{
public:
    // Pixel operations to blend, multiply, add, blur, etc.

private:
    Image* image;          // back pointer to access adjacent pixels
    unsigned char rgba[4];
};

Hopefully no one actually does this. To make the example really gross, I stored a back pointer to the image containing the pixel so that it can access neighboring pixels for image processing algorithms like blur.

The image back pointer immediately adds a glaring overhead, but even if we excluded it (making only the public interface of pixel provide operations that apply to a single pixel), we end up with a class just to represent a pixel.

Now there's nothing wrong with a class in the immediate overhead sense in a C++ context besides this back pointer. Optimizing C++ compilers are great at taking all the structure we built and obliterating it down to smithereens.

The difficulty here is that we're modeling an encapsulated interface at too granular of a pixel level. That leaves us trapped with this kind of granular design and data, with potentially a vast number of client dependencies coupling them to this Pixel interface.

Solution: obliterate away the object-oriented structure of a granular pixel, and start modeling your interfaces at a coarser level dealing with a bulk number of pixels (at the image level).

By modeling at the bulk image level, we have significantly more room to optimize. We can, for example, represent large images as coalesced tiles of 16x16 pixels which perfectly fit into a 64-byte cache line but allow efficient neighboring vertical access of pixels with a typically-small stride (if we have a number of image processing algorithms which need to access neighboring pixels in a vertical fashion) as a hardcore data-oriented example.

Designing at a Coarser Level

The above example of modeling interfaces at an image level is kind of a no-brainer example as image processing is a very mature field that's been studied and optimized to death. Yet less obvious might be a particle in a particle emitter, a sprite vs. a collection of sprites, an edge in a graph of edges, or even a person vs. a collection of people.

The key to allowing data-oriented optimizations (in foresight or hindsight) is often going to boil down to designing interfaces at a much coarser level, in bulk. The idea of designing interfaces for single entities becomes replaced by designing for collections of entities with big operations that process them in bulk. This especially and immediately targets sequential access loops that need to access everything and cannot help but have linear complexity.

Data-oriented design often begins with the idea of coalescing data to form aggregates modeling data in bulk. A similar mindset echoes to the interface designs that accompany it.

This is the most valuable lesson I've taken from data-oriented design, since I'm not computer architecture-savvy enough to often find the most optimal memory layout for something on my first try. It becomes something I iterate towards with a profiler in hand (and sometimes with a few misses along the way where I failed to speed things up). Yet the interface design aspect of data-oriented design is what leaves me room to seek more and more efficient data representations.

The key is to design interfaces at a coarser level than we're usually tempted to do. This also often has side benefits like mitigating the dynamic dispatch overhead associated with virtual functions, function pointer calls, dylib calls and the inability for those to be inlined. The main idea to take out of all of this is to look at processing in a bulk fashion (when applicable).