Design Patterns – OOP ECS vs Pure ECS in Game Development

design-patternsentity-component-systemgame developmentobject-oriented

Firstly, I am aware that this question links with the topic of game development but I have decided to ask it here since it really comes down to a more general software engeneering problem.

During the past month, I have read a lot about Entity-Component-Systems and now are quite comfortable with the concept. However, there is one aspect that seems to be missing a clear 'definition' and different articles have suggested radically different solutions:

This is the question of whether an ECS should break encapsulation or not. In other words its the OOP style ECS (components are objects with both state and behaivour that encapsulate the data specific to them) vs the pure ECS (components are c style structs that only have public data and systems provide the functionality).

Note that I am developping a Framework / API / Engine. So the goal is that it can easily be extended by whoever is using it. This includes stuff like adding a new type of render or collision component.

Problems with the OOP approach

Components must access data of other components. E.g. the render component's draw method must access the transform component's position. This creates dependencies in code.
Components can be polymorphic which further introduces some complexity. E.g. There might be a sprite render component that overrides the render component's virtual draw method.

Problems with the pure approach

Since the polymorphic behaivour (e.g. for rendering) has to be implemented somewhere, it is just outsourced into the systems. (e.g. the sprite render system creates a sprite render node that inherits render node and adds it to the render engine)
The communication between systems can be difficult to avoid. E.g. the collision system might need the bounding box which is calculated from whatever concrete render component there is. This can be solved by letting them communicate via data. However, this removes instant updates since the render system would update the bounding box component and the collision system would then use it. This may lead to preblems if the order of calling the system's update functions is not defined. There is an event system in place that allows for systems to raise events that other systems can subscribe their handlers to. However, this only works for telling systems what to do i.e. void functions.
There are additional flags needed. Take a tile map component for example. It would have a size, tile size and index list field. The tile map system would handle the respective vertex array and assign the texture coordinates based on the component's data. However, recalculating the entire tilemap every frame is expensive. Therefore, a list would be needed to keep track of all the changes made to then update them in the system. In the OOP way this could be encapsulated by the tile map component. E.g. the SetTile() method would update the vertex array whenever its called.

Although I see the beauty of the pure approach, I don't really understand what kind of concrete benefits it would have over a more traditional OOP. The dependencies between components still exist although being hidden away in the systems. Also I would need a lot more classes to accomplish the same goal. This seems to me like a somewhat over engineered solution which is never a good thing.

Furthermore, I am not that interrested in performance so this whole idea of data-oriented design and cashe misses doesn't really matter to me. I just want a nice architecture ^^

Still, most of the articles and discussion I read suggest the second approach. WHY?

Animation

Lastly, I want to ask the question of how I would handle animation in a pure ECS. Currently I have defined an animation as a functor that manipulates an entity based on some progress between 0 and 1. The animation component has a list of animators which has a list of animations. In its update function it then applies whatever animations are currently active to the entity.

Note:

I have just read this post Is the Entity Component System architecture object oriented by definition? which explains the problem a bit better than I do. Although basically being on the same topic it still doesn't give any answers as to why the pure data approach is better.

Best Answer

This is a tough one. I'll just try to tackle some of the questions based on my particular experiences (YMMV):

Components must access data of other components. E.g. the render component's draw method must access the transform component's position. This creates dependencies in code.

Don't underestimate the amount and complexity (not degree) of coupling/dependencies here. You could be looking at the difference between this (and this diagram is already ridiculously simplified to toy-like levels, and the real-world example would have interfaces in between to loosen the coupling):

... and this:

... or this:

Components can be polymorphic which further introduces some complexity. E.g. There might be a sprite render component that overrides the render component's virtual draw method.

So? The analogical (or literal) equivalent of a vtable and virtual dispatch can be invoked via the system rather than the object hiding its underlying state/data. Polymorphism is still very practical and feasible with the "pure' ECS implementation when the analogical vtable or function pointer(s) turns into "data" of sorts for the system to invoke.

Since the polymorphic behaivour (e.g. for rendering) has to be implemented somewhere, it is just outsourced into the systems. (e.g. the sprite render system creates a sprite render node that inherits render node and adds it to the render engine)

So? I hope this is not coming off as sarcasm (not my intent though I've been accused of it often but I wish I could communicate emotions better through text), but "outsourcing" polymorphic behavior in this case doesn't necessarily incur an additional cost to productivity.

The communication between systems can be difficult to avoid. E.g. the collision system might need the bounding box which is calculated from whatever concrete render component there is.

This example seems particularly weird to me. I don't know why a renderer would be outputting data back to the scene (I generally consider renderers read-only in this context), or for a renderer to be figuring out AABBs instead of some other system to do this for both renderer and collision/physics (I might be getting hung up on the "render component" name here). Yet I don't want to get too hung up on this example since I realize that's not the point you're trying to make. Still the communication between systems (even in the indirect form of read/writes to the central ECS database with systems depending rather directly on transformations made by others) shouldn't need to be frequent, if at all necessary. That's contradicting some of what I wrote immediately below about the importance of determining order of evaluation upfront but that's with practical needs for user response rather than "correctness" (it's not necessarily a temporal coupling issue but a user-end design issue of ensuring frames output the latest results without lagging behind).

This may lead to preblems if the order of calling the system's update functions is not defined.

This absolutely should be defined. The ECS is not the end-all solution to rearrange system processing evaluation order of every possible system in the codebase and get back exactly same kind of results to the end user dealing with frames and FPS. This is one of the things, when designing an ECS, that I'd at least strongly suggest should be anticipated somewhat upfront (though with a lot of forgiving breathing room to change minds later provided it's not altering the most critical aspects of the ordering of system invocation/evaluation).

However, recalculating the entire tilemap every frame is expensive. Therefore, a list would be needed to keep track of all the changes made to then update them in the system. In the OOP way this could be encapsulated by the tile map component. E.g. the SetTile() method would update the vertex array whenever its called.

I didn't quite understand this one except that it's a data-oriented concern. And there are no pitfalls as to representing and storing data in an ECS, including memoization, to avoid such performance pitfalls (the biggest ones with an ECS tend to relate to things like systems querying for available instances of particular component types which is one of the most challenging aspects of optimizing a generalized ECS). The fact that logic and data are separated in a "pure" ECS doesn't mean you suddenly have to recompute things you could have otherwise cached/memoized in an OOP representation. That's a moot/irrelevant point unless I glossed over something very important.

With the "pure" ECS you can still store this data in the tile map component. The only major difference is that the logic to update this vertex array would move to a system somewhere.

You can even lean on the ECS to simplify the invalidation and removal of this cache from the entity if you create a separate component like TileMapCache. At that point when the cache is desired but not available in an entity with a TileMap component, you can compute it and add it. When it's invalidated or no longer needed, you can remove it through the ECS without having to write more code specifically for such invalidation and removal.

The dependencies between components still exist although being hidden away in the systems

There's no dependency between components in a "pure" rep (I don't think it's quite right to say that dependencies are being hidden here by the systems). Data doesn't depend on data, so to speak. Logic depends on logic. And a "pure" ECS tends to promote the logic to be written in a way so as to depend on the absolute minimal subset of data and logic (often none) a system requires to work, which is unlike many alternatives which often encourage depending on far more functionality than required for the actual task. If you're using the pure ECS right, one of the first things you should appreciate is the decoupling benefits while simultaneously questioning everything you ever learned to appreciate in OOP about encapsulation and specifically information hiding.

By decoupling I specifically mean how little information your systems need to work. Your motion system doesn't even need to know about something far more complex like a Particle or Character (the developer of the system doesn't necessarily even need to know such entity ideas even exist in the system). It just needs to know about the bare minimum data like a position component which could be as simple as a few floats in a struct. It's even less information and fewer external dependencies than what a pure interface like IMotion tends to carry along with it. It's primarily due to this minimal knowledge that each system requires to work that makes the ECS often so forgiving to handle very unanticipated design changes in hindsight without facing cascading interface breakages all over the place.

The "impure" approach you suggest somewhat diminishes that benefit since now your logic isn't localized strictly to systems where changes don't cause cascading breakages. The logic would now be centralized to some degree in the components accessed by multiple systems which now have to fulfill interface requirements of all the various systems that could use it, and now it's like every system then needs to have knowledge of (depend on) more information than it strictly needs to work with that component.

Dependencies to Data

One of the things that's controversial about the ECS is that it tends to replace what might otherwise be dependencies to abstract interfaces with just raw data, and that's generally considered a less desirable and tighter form of coupling. But in the kinds of domains like games where ECS can be very beneficial, it's often easier to design the data representation upfront and keep it stable than it is to design what you can do with that data at some central level of the system. That's something I've painfully observed even among seasoned veterans in codebases that utilizes more of a COM-style pure interface approach with things like IMotion.

The developers kept finding reasons to add, remove, or change functions to this central interface, and each change was ghastly and costly because it would tend to break every single class that implemented IMotion along with every since place in the system that used IMotion. Meanwhile the entire time with so many painful and cascading changes, the objects that implemented IMotion were all just storing a 4x4 matrix of floats and the whole interface was just concerned with how to transform and access those floats; the data representation was stable all the way from the beginning, and a lot of pain could have been avoided if this centralized interface, so prone to change with unanticipated design needs, didn't even exist in the first place.

This could all sound almost as disgusting as like global variables but the nature of how the ECS organizes this data into components retrieved explicitly by type through systems makes it so, while compilers can't enforce anything like information hiding, the places that access and mutate the data are generally very explicit and obvious enough to still effectively maintain invariants and predict what sort of transformations and side effects go on from one system to the next (actually in ways that can arguably be simpler and more predictable than OOP in certain domains given how the system turns into a flat sort of pipeline).

Lastly, I want to ask the question of how I would handle animation in a pure ECS. Currently I have defined an animation as a functor that manipulates an entity based on some progress between 0 and 1. The animation component has a list of animators which has a list of animations. In its update function it then applies whatever animations are currently active to the entity.

We're all pragmatists here. Even in gamedev you'll probably get conflicting ideas/answers. Even the purest ECS is a relatively new phenomena, pioneering territory, for which people haven't necessarily formulated the strongest opinions on how to skin cats. My gut reaction is an animation system which increments this sort of animation progress in animated components for the rendering system to display, but that's ignoring so much nuance for the particular application and context.

With the ECS it's not a silver bullet and I do still find myself with tendencies to go in and add new systems, remove some, add new components, change an existing system to pick up that new component type, etc. I don't get things right at all the first time around still. But the difference in my case is that I'm not changing anything central when I fail to anticipate certain design needs upfront. I'm not getting the rippling effect of cascading breakages that require me to go all the over the place and change so much code to handle some new need that crops up, and that's quite the time saver. I'm also finding it easier on my brain because when I sit down with a particular system, I don't need to know/remember that much about anything else besides the relevant components (which are just data) to work on it.

Related Solutions

Entity-Component-System – Interaction Between Systems

So the question is - how should I implement the interaction between the various systems?

Ideally they don't interact, not in any direct sense. The systems in an ECS all have access to the central ECS database where they can fetch entities and components attached to them. They don't talk to each other directly. They talk to the database and all run independently of each other.

Dependencies Flow Towards Raw Data, Not Abstractions

The dependencies in an ECS do not flow towards functions, not even abstract functionality. They all flow towards raw data which might sound like an epic violation of many accepted software engineering principles, and in my opinion it is, but yields something easier to maintain for some cases. Maybe some software engineering principles are wrong or at least not applicable for all scenarios. There are many situations where it's easier to achieve data stability than interface/design stability. As a basic example, it's much easier to reason about what data fields a raw matrix component should have once and for all and keep that stable (unchanging) for years to come. It's much harder to figure out all the functions an abstract IMatrix interface should provide once and for all and keep that perfectly stable (unchanging) for years to come without facing temptations, if not outright needs, to add and remove and change functions.

So in appropriate cases, when your dependencies flow towards data instead of abstract functionality, your codebase will find fewer and fewer and fewer reasons to have to face central design changes with cascading effects and potentially big parts having to be rewritten. To direct dependencies towards data in that case is directing them towards stability. It's worth asking yourself as a developer whether the tendency in your system is for developers to add, change, and remove functions or to add, change, and remove data from components. If it's the former case, you might benefit greatly from an ECS engine.

If systems start to depend on each other a lot, that's directing dependencies away from data and towards functionality, and many of the maintenance benefits and the ability to reason about the correctness of your engine and easily keep it stable at the design level will be lost. Of course a pragmatic solution might sometimes call for a system calling a function in another every once in a while, but you should generally seek to keep that to a bare minimum. Instead of talking directly to each other, you can have systems modify and attach components to entities in a way such that other systems can then pick up those changes and react accordingly.

System Interaction

[...] the rendering system must know the data from the positional component of an entity in order to draw it in a correct position. And so on.

That it can grab from the ECS database, looping through entities with renderable and position components, just as the physics system before it might loop through entities with position components and modify their position. Generally each system fits into a basic loop model:

for each entity with the components I'm interested in:
    do something with the components

... and you have to start thinking about doing things in passes, often multiple passes even if the intuitive solution is to do everything in one pass. For example, it might come more intuitively to loop through all your game entities and apply physics and respond to input and process AI and render them all in one go. That can minimize the amount of loops you have and also require less state. However, the ECS tackles this typically with multiple simpler passes and sometimes slightly more intermediary state to use from one pass to the next, but as a trade-off, it leads to a much easier system to maintain and one which is easier to change and potentially parallelize* and vectorize.

As yuri mentioned, it could also make things harder to parallelize, at least across systems in an inter-system way, but could make things easier to parallelize in an intra-system way because it's easier to reason about the correctness of a parallel loop without locking if it's, say, making less state changes on the way and the code involved in the pass is much simpler. In my blunt opinion, it's often not worth multithreading the systems themselves so much as the loops they are performing inside for the most performance-critical systems.

Multiple, Simpler Passes

It's somewhat similar to GPU programming since GPUs aren't so good at doing complex things with each iteration, so they often excel instead at doing simple things per iteration that add up to a complex task after repeatedly running through the same data with multiple, simpler passes.

Unlike GPU programming, you can still potentially do much more complex things in a single pass, but each pass will represent like one logical thought: "for each of these components, apply physics", not both physics and rendering. The physics system performs its own pass just as the rendering system, living in its own isolated world, performs its own completely separate and detachable rendering pass. Each system lives in its own little world, seeing only the ECS database and being able to grab components and entities inside. They shouldn't have to bother with what other systems are doing.

In fact, in a well-designed ECS, you can remove any system from the engine and not have the codebase collapse horribly on itself because systems don't depend on each other to function. All they care about is the central database and the components (which are raw data) that they are interested in processing. They all live in their own isolated world. As a result you should be able to remove the physics system from your game, at which point motion components will cease to have physics applied, but everything else should keep on working just as before. It's extremely orthogonal in that respect.

Event-Driven Programming

Event-driven programming can be a bit awkward with ECS, but one straightforward way to solve that is to have event queue components. A system can push events to these queue components for another system to pop and process in a deferred fashion without the first system directly calling functions in the second. Again the bulk of your interactions should not be system->system, but system->ECS, and system->component.

Object-oriented – Is the Entity Component System architecture object oriented by definition

Introduction

Entity–component systems are an object-oriented architectural technique.

There is no universal consensus of what the term means, same as object-oriented programming. However, it is clear that entity–component systems are specifically intended as an architectural alternative to inheritance. Inheritance hierarchies are natural for expressing what an object is, but in certain kinds of software (such as games), you would rather express what an object does.

It is a different object model than the “classes and inheritance” one to which you’re most likely accustomed from working in C++ or Java. Entities are as expressive as classes, just like prototypes as in JavaScript or Self—all of these systems can be implemented in terms of one another.

Examples

Let’s say that Player is an entity with Position, Velocity, and KeyboardControlled components, which do the obvious things.

entity Player:
  Position
  Velocity
  KeyboardControlled

We know Position must be affected by Velocity, and Velocity by KeyboardControlled. The question is how we would like to model those effects.

Entities, Components, and Systems

Suppose that components have no references to one another; an external Physics system traverses all Velocity components and updates the Position of the corresponding entity; an Input system traverses all KeyboardControlled components and updates the Velocity.

          Player
         +--------------------+
         | Position           | \
         |                    |  Physics
       / | Velocity           | /
  Input  |                    |
       \ | KeyboardControlled |
         +--------------------+

This satisfies the criteria:

No game/business logic is expressed by the entity.
Components store data describing behaviour.

The systems are now responsible for handling events and enacting the behaviour described by the components. They are also responsible for handling interactions between entities, such as collisions.

Entities and Components

However, suppose that components do have references to one another. Now the entity is simply a constructor which creates some components, binds them together, and manages their lifetimes:

class Player:
  construct():
    this.p = Position()
    this.v = Velocity(this.p)
    this.c = KeyboardControlled(this.v)

The entity might now dispatch input and update events directly to its components. Velocity would respond to updates, and KeyboardControlled would respond to input. This still satisfies our criteria:

The entity is a “dumb” container which only forwards events to components.
Each component enacts its own behaviour.

Here component interactions are explicit, not imposed from outside by a system. The data describing a behaviour (what is the amount of velocity?) and the code that enacts it (what is velocity?) are coupled, but in a natural fashion. The data can be viewed as parameters to the behaviour. And some components don’t act at all—a Position is the behaviour of being in a place.

Interactions can be handled at the level of the entity (“when a Player collides with an Enemy…”) or at the level of individual components (“when an entity with Life collides with an entity with Strength…”).

Components

What is the reason for the entity to exist? If it is merely a constructor, then we can replace it with a function returning a set of components. If we later want to query entities by their type, we can just as well have a Tag component which lets us do just that:

function Player():
  t = Tag("Player")
  p = Position()
  v = Velocity(p)
  c = KeyboardControlled(v)
  return {t, p, v, c}

Entities are as dumb as can be—they’re just sets of components.
Components respond directly to events as before.

Interactions must now be handled by abstract queries, completely decoupling events from entity types. There are no more entity types to query—arbitrary Tag data is probably better used for debugging than game logic.

Conclusion

Entities are not functions, rules, actors, or dataflow combinators. They are nouns which model concrete phenomena—in other words, they are objects. It is as Wikipedia says—entity–component systems are a software architecture pattern for modeling general objects.