Domain Objects – Primitive vs Class to Represent Simple Domain Object

cdomain-driven-designdomain-objectsjavaobject-oriented

What are general guidelines or rules of thumb for when to use a domain-speciifc object vs a plain String or number?

Examples:

  • Age class vs Integer?
  • FirstName class vs String?
  • UniqueID vs String
  • PhoneNumber class vs String vs Long?
  • DomainName class vs String?

I think most OOP practitioners would definitely say specific classes for PhoneNumber and DomainName. The more rules around what makes them valid and how to compare them make simple classes easier and safer to deal with. But for the first three there is more debate.

I have never come across an "Age" class but one could argue it makes sense given it must be non-negative (okay I know you can argue for negative ages but it's a good example that it's almost equivalent to a primitive integer).

String is common to represent "First Name" but it's not perfect because an empty String is a valid String but not a valid name. Comparison would usually be done ignoring case. Sure there are methods to check for empty, do case-insensitive compare, etc but it requires the consumer to do this.

Does the answer depend on the environment? I am primarily concerned with enterprise/high-value software that will live and be maintained for possibly more than a decade.

Perhaps I'm overthinking this but I would really like to know if anyone has rules on when to choose class vs primitive.

Best Answer

What are general guidelines or rules of thumb for when to use a domain-speciifc object vs a plain String or number?

The general guideline is that you want to be modeling your domain in a domain specific language.

Consider: why do we use integer? We can represent all of the integers with strings just as easily. Or with bytes.

If we were programming in a domain agnostic language that included primitive types for integer and age, which would you choose?

What it really comes down to is the "primitive" nature of certain types is an accident of the choice of language for our implementation.

Numbers, in particular, usually require additional context. Age isn't just a number but it also has dimension (time), units (years?), rounding rules! Adding ages together makes sense in a way that adding an age to a money does not.

Making the types distinct allows us to model the differences between an unverified email address and a verified email address.

The accident of how these values are represented in memory is one of the least interesting parts. The domain model doesn't care of a CustomerId is an int, or a String, or a UUID/GUID, or a JSON node. It just wants the affordances.

Do we really care whether integers are big endian or little endian? Do we care if the List we have been passed is an abstraction over an array, or a graph? When we discover that double precision arithmetic is inefficient, and that we need to change to a floating point representation, should the domain model care?

Parnas, in 1972, wrote

We propose instead that one begins with a list of difficult design decisions or design decisions which are likely to change. Each module is then designed to hide such a decision from the others.

In a sense, the domain specific value types we introduce are modules that isolate our decision of what underlying representation of the data should be used.

So the upside is modularity - we get a design where it is easier to manage the scope of a change. The downside is cost - it's more work to create the bespoke types that you need, choosing the correct types requires acquiring a deeper understanding of the domain. The amount of work required to create the value module will depend on your local dialect of Blub.

Other terms in the equation might include expected lifetime of the solution (careful modeling for script ware that will be run once has lousy return on investment), how close the domain is to the core competency of the business.

One special case that we might consider is that of the communication across a boundary. We don't want to be in a situation where changes to one deployable unit require coordinated changes with other deployable units. So messages tend to be focused more on representations, without consideration of invariants or domain specific behaviors. We're not going to try to communicate "this value must be strictly positive" in the message format, but rather communicate its representation on the wire, and apply validation to that representation at the domain boundary.

Related Topic