C# – Why Avoid Using Structs for Passive Data Structures?

cobject-oriented

Context

I recently read about the object-oriented technique of making a distinction between objects and passive data structures, best summarized in Clean Code:

"Objects hide their data […] and expose functions […]. Data
structures expose their data and have no meaningful functions."

I am considering using C# structs for passive data structures.
To clarify this: IF part of my code needs to function as a passive data structure, THEN I want to use a struct for that.

Advantages

It would provide a language-given distinction between objects and passive data structures.

Also, if an object has a private field of a class, but exposes that in a function, it can be changed somewhere else. That means the object data is changed from the outside, which is not good. I know you should e.g. expose an internal List as ReadOnlyList, but that is good practice, which even good programmers don't always follow. Using structs instead would automatically enforce this.

What I found out so far

I know the question "When to use struct" is already answered several times. The answers always boil down to the advice from the official docs:

AVOID defining a struct unless the type has all of the following
characteristics:

  • It logically represents a single value, similar to primitive types (int, double, etc.).
  • It has an instance size under 16 bytes.
  • It is immutable.
  • It will not have to be boxed frequently.

I think the first 2 points are for improving performance on the stack. However, as far as I understand structs are better on the stack, but not worse on the heap.

Point 3 I can still adhere to. Might make the code cleaner, might make it more awkward, but I don't know yet.

Point 4 is also about performance improvements, but I also actually don't need a lot of performance. Even if, at this point that would be early, early optimization – I'm not working with big data here.

With a name like that, I want to think structs are exactly the thing to use for object oriented passive data structures. The documentation from the official docs makes me doubt that though, especially the size limitation. Even 2 strings for an address with 2 rows would already be too much.

The question

Are there other arguments against using structs for these passive data structures? Or did I understand something wrong?

Example

public struct EmployeeId // data structure (exposed data, no functions)
{
    public string Value;
}

public struct Address // data structure
{
    public string Line1;
    public string Line2;
}

public struct Performance // data structure
{
    public int Successes;
    public int Failures;
}

public struct Employee : IEquatable<Employee>
// data structure
{
    public EmployeeId Id:
    public Address Address;
    public Performance Performance;

    public bool Equals(Employee employee)
    {
        return Id == employee.Id;
    }
}

public class OfficialEmployeeRegistry // object (hidden data, exposed functions)
{
    private Dictionary<EmployeeId, Employee> _employees;

    public void Add(Employee employee)
    {
        _employees.Add(employee.Id, employee);
    }

    public List<Employee> GetPositivePerformers() {...}
}

public class SantaClause // object
{
    private EmployeeRegistry _employeeRegistry;
    private PresentSender _presentSender;

    public void SendChristmasPresents()
    {
        List<Employee> goodEmployees = employeeRegistry.GetPositivePerformers();
        foreach(Employee employee in goodEmployees)
        {
            _presentSender.SendPresent(employee.Address);
        }
    }
}

All structs in this code are examples of what I want to do. For example, we can get the performance of an employee from the OfficialEmployeeRegistry now. We can send that data to a printer class, but if that class changes it in the process, the entries in the OfficialEmployeeRegistry are protected. OfficialEmployeeRegistry data will only be manipulated by itself. Oh, and the structs are supposed to be immutable of course, but I feel adding a constructor to each would bloat this post to much.

Reaction to commments

Do you require data serialization?
No.

Will this need to be passed into and from functions/methods?
Yes.

Will it be iterated and modified on a fairly significant basis?
No. I guess this is about performance; but performance is definitely not an issue

Best Answer

What speaks against using structs for passive data structures?

Technically, nothing.

(if you can live with the some restrictions mentioned by Erik Eidt, like no inheritance).

I am pretty sure if you try to use C# structs for all of your data objects, you can still write correct programs following this convention. The 16 byte size limitation is only a recommendation for performance, and can be safely ignored.

However, the real question you should ask yourself is, does this convention produce better maintainable, evolvable, readable programs, compared to programs which follow the more usual C# convention to use classes in most cases, even for most kind of DTOs or "passive data structures", and structs only for exceptional cases.

And that is IMHO questionable. I usually prefer to use classes first and foremost in my programs (even for pure data objects, when I only need a list of public member functions), and use struct only for such optimizations like mentioned by @Ivan. The reason is I made the experience that often when I start with a small data structure where a simple struct would be suffient, over time new requirements to this structures cause some evolvement where then a full class starts to make more sense. And if I have used that struct in various places of the program, changing it into a class afterwards can introduce a lot of subtile errors because of the switch from value semantics to reference semantics.

Moreover, if a type is defined as a struct or as a class is not immediately detectable when reading code using that type. However, there are subtile semantic differences between the two (see the comment from @BerinLoritsch below the question). Using struct only for exceptional optimization cases makes errors in my code caused by those differences less likely.

So my recommendation here is, use structs exclusively for passive data structures when you know for sure this particular data structure will never ever evolve to a class, and only if you have a useful optimization scenario. And if in doubt, better use a class.

Related Topic