Domain Objects – Primitive vs Class to Represent Simple Domain Object

cdomain-driven-designdomain-objectsjavaobject-oriented

What are general guidelines or rules of thumb for when to use a domain-speciifc object vs a plain String or number?

Examples:

Age class vs Integer?
FirstName class vs String?
UniqueID vs String
PhoneNumber class vs String vs Long?
DomainName class vs String?

I think most OOP practitioners would definitely say specific classes for PhoneNumber and DomainName. The more rules around what makes them valid and how to compare them make simple classes easier and safer to deal with. But for the first three there is more debate.

I have never come across an "Age" class but one could argue it makes sense given it must be non-negative (okay I know you can argue for negative ages but it's a good example that it's almost equivalent to a primitive integer).

String is common to represent "First Name" but it's not perfect because an empty String is a valid String but not a valid name. Comparison would usually be done ignoring case. Sure there are methods to check for empty, do case-insensitive compare, etc but it requires the consumer to do this.

Does the answer depend on the environment? I am primarily concerned with enterprise/high-value software that will live and be maintained for possibly more than a decade.

Perhaps I'm overthinking this but I would really like to know if anyone has rules on when to choose class vs primitive.

Best Answer

What are general guidelines or rules of thumb for when to use a domain-speciifc object vs a plain String or number?

The general guideline is that you want to be modeling your domain in a domain specific language.

Consider: why do we use integer? We can represent all of the integers with strings just as easily. Or with bytes.

If we were programming in a domain agnostic language that included primitive types for integer and age, which would you choose?

What it really comes down to is the "primitive" nature of certain types is an accident of the choice of language for our implementation.

Numbers, in particular, usually require additional context. Age isn't just a number but it also has dimension (time), units (years?), rounding rules! Adding ages together makes sense in a way that adding an age to a money does not.

Making the types distinct allows us to model the differences between an unverified email address and a verified email address.

The accident of how these values are represented in memory is one of the least interesting parts. The domain model doesn't care of a CustomerId is an int, or a String, or a UUID/GUID, or a JSON node. It just wants the affordances.

Do we really care whether integers are big endian or little endian? Do we care if the List we have been passed is an abstraction over an array, or a graph? When we discover that double precision arithmetic is inefficient, and that we need to change to a floating point representation, should the domain model care?

Parnas, in 1972, wrote

We propose instead that one begins with a list of difficult design decisions or design decisions which are likely to change. Each module is then designed to hide such a decision from the others.

In a sense, the domain specific value types we introduce are modules that isolate our decision of what underlying representation of the data should be used.

So the upside is modularity - we get a design where it is easier to manage the scope of a change. The downside is cost - it's more work to create the bespoke types that you need, choosing the correct types requires acquiring a deeper understanding of the domain. The amount of work required to create the value module will depend on your local dialect of Blub.

Other terms in the equation might include expected lifetime of the solution (careful modeling for script ware that will be run once has lousy return on investment), how close the domain is to the core competency of the business.

One special case that we might consider is that of the communication across a boundary. We don't want to be in a situation where changes to one deployable unit require coordinated changes with other deployable units. So messages tend to be focused more on representations, without consideration of invariants or domain specific behaviors. We're not going to try to communicate "this value must be strictly positive" in the message format, but rather communicate its representation on the wire, and apply validation to that representation at the domain boundary.

Type safety

Lets compare two bits of code

String foo(Integer a, Integer b, String s) {
  if(a.compareTo(b) == 0) {
    return s;
  } else {
    return "" + (a - b); // just as an example
  }
}

String foo(Map<String, Object> m) {
  if(m.get("a").compareTo(m.get("b")) == 0) {
    return (String)m.get("s");
  } else {
    return "" + (((Integer)m.get("a") - (Integer)m.get("b"));
  }
}

In the second example, we have to use Object as the value because its the only object that String and Integer share in common. This means that all objects further down need to be cast. This has the distinct possibility of throwing a ClassCastException (javadoc) anywhere or everywhere in your code... unless you add a ton of boiler plate code to prevent that (guard conditions for instanceof all over).

Whats worse, is that those exceptions (if you don't check everything - and the paths you follow if you do) are runtime errors. The errors of your types won't be found until you run the program rather than when you compile it. Errors found when you compile are easier to fix than ones you discover at runtime. For that matter, the static analysis tools and inspections that most IDEs give you will catch them for you as you type them (and throw up horrendous warnings about trying to do it with the abstract data type).

Overloads

Passing all of the arguments as a single ADT means you have a single method. In the example above, what if you had:

String foo(Integer a, Integer b) { ... }
String foo(Integer a, Integer b, String s) { ... }
String foo(Integer a, Integer b, String s, String t) { ... }

as different ways to call the function. With passing one ADT, you've got just

String foo(Map<String, Object> m) { ... }

There is no way to differentiate the overloads. This leads to code that will look like:

String foo(Map<String, Object> m) {
  if(m.containsKey("t")) { ... }
  else if(m.containsKey("s")) { ... }
  else { ... }
}

The complexity of the function will go through the roof (unless you start extracting those to other methods that are somehow indicating what type they're dealing with and my head is starting to hurt thinking about the hungarian notation you're going to be sticking into the method names).

This further gets problematic when you have different types that are valid arguments.

String foo(Integer a, Integer b, String s) { ... }
String foo(BigInteger a, BigInteger b, String s) { ... }

Typo safety

Long ago, I was a perl programmer - back in the days before OOP found its way into perl (bless its reference). The only way to pass around complex objects was with %hashes, and @arrays. And you had to pull out the keys of the hash. There were numerous bugs that could only be caught at runtime (and the joys of autovivication made that challenging at times).

A simple typo in a key would mean some parameter was there wasn't found (when it was there) or was just created in the wrong spot and passed to another method.

String plusOne(Map<String, Integer> m) {
  if(m.containsKey("type") && m.containsKey("1") {
    return m.get("typo") + m.get("I");
  }
  return 0;
}

These are not fun bugs to have to find. No bugs are really fun, but these bugs just smack you in the face because they are completely preventable if you had just used a proper parameter list and work with the language and the compiler.

Calling Packaging

So far, I've talked about the dangers in the method itself. What about work that the callee has to do.

System.out.println(foo(1,2,"bar"));

And we're done with the simple parameter list.

Map<String, Object> m = new HashMap<String, Object>();
m.put("a", 1);
m.put("b", 2);
m.put("s", "bar");
System.out.println(foo(m));

Now, you want to debug this? What are you passing into foo? You've got to look back through the code and track it. Besides being a significantly larger block of code to do any method call of this type, its also obscufcating what is going into this.

Refactors

foo now takes Doubles rather than Integers. You change the method signature and fix all the compile time errors. You've found them all and it compiles correctly (see type safety above).

However, the map version works just as well with Integer as it does with Double. You can't find out if you've fixed the refactoring or not.

Why are they in a `Map` together?

This is more a philosophical question. Why are these objects in a map together? What common attribute do they share? If they really are part of some other data structure

class Cell {
  int x, y;
  String value;
}

make them into a proper data structure that sticks together. If they aren't things that are honestly related to each other, well... don't.

Putting things together in some structure makes them related to each other in our mind, even if they aren't. If they aren't this adds significant mental gymnastics in order to keep what things are related to each other and what ones aren't apart.

Best Answer

Related Solutions

C# Validation – Best Way of Validating Class Properties

Java Encapsulation – How to Encapsulate Method Parameters