DDD – Global Unique Identities vs Surrogate Keys

Architecturedomain-driven-designobject-oriented-design

Let's start with an example: we have an entity: Book. It has unique identity: Isbn – a value object that wraps a String. This is an UUID.

The Book entity also needs a surrogate id from the repo (sql db). We need it so we can e.g. find the books in faster way, since databases find by number faster then by string.

From what I read, the surrogate key should be hidden from the Books interface. But we want to use it, to locate books much faster.

How to deal with this in a proper way?

[A] We can simply add a getSurrogateId() (or any better name) in the Book. This pollutes the entity, but its KISS.

[B] We can have the repo responsible for finding the surrogate keys for the natural ones. For example, BookRepository may have the following method:

long toSurrogateKey(Isbn isbn) {
     // lookup the cache
     // if not found, lookup the db
}

to return the surrogate key – of course, these values can be cached locally, so we do not need to search db every time. This method should not be public (right)?

[C] We can go even further: to think about surrogate key as a repository specific one. Book may be an interface, and SqlBook may be an repository-made implementation. This SqlBook implementation may then store any additional information needed for the repository. In this case, we would have the surrogate key as one of the properties of SqlBook – and we do not care it is visible, as users of SqlBook only sees it as a Book, i.e. not knowing about the surrogate id.

So above method becomes (defined in SqlBook class):

long toSurrogateKey(Book book) {
     return ((SqlBook)book).getSurrogateId();
}

The only drawback here is that Book (and other entities) must be created by a Factory that is repository-aware. In other words, we would need to have SqlFactory implementation of some factory that creates SqlBooks for us.

Any wisdom on this?

Best Answer

We need it so we can e.g. find the books in faster way, since databases find by number faster then by string.

Surrogate keys should primarily be added because they provide you a uniform way for building your primary keys, not because of any hypothetical performance issues. They will help you to avoid having business data like an Isbn distributed over half of your model in separate places because you misuse them as foreign keys.

"From what I read, the surrogate key should be hidden from the Books interface".

Maybe you just misunderstand the purpose of this? Surrogate keys are technical details which should be hidden when discussing the model with your domain experts, but it is perfectly ok to see them when you change your viewpoint to the implementation of the model. So go with [A], but make sure the getSurrogateId() accessor is not visible in your graphical form of your domain model.

Related Solutions

Architecture – “Implementing DDD” by Vernon: value object or not

The reason is that SQL databases store the objects in a relational way - items are stored in different table than aggregate root and they reference back to aggregate root by IDs. So using of MySQL requires to model items as entities that are persisted in separate table where they obtain primary id (with auto increment).

In key-value stores eg. MongoDB one can store the whole collection of items as part of the aggregate root. The items would not become separate entities (in its on table) and thus can be modelled as value objects.

Eg. Blog has Comments stored as collection within:

{
  _id: 1,
  title: 'Some title',
  body: 'Blog body',
  comments: [{
     person: 'John Smith',
     comment: 'First comment of the blog',
     created_at: new Date()
  },
  {
     person: 'Peter Jackson',
     comment: 'Second comment of the blog',
     created_at: new Date()
  }],
}

C# – DDD – Factory or Service

I suggest you actually attempt to model the actions to which you are alluding above in your application layer. Namely RegisterAccount and Login, because I think it can help bring clarity to your situation. That is, by working backwards from the public surface you would like your domain to expose to your application layer, it can help inform us where responsibilities should (and should not) be given.

For example, you may have a RegisterAccount command handler that boils down to the following:

accountRepository.Add( new Account(cmd.UserName, cmd.Password) )

accountRepository.Save() // throws DuplicateUserName

and a Login command handler that boils down to:

account = accountRepository.FindByUserName( cmd.UserName ) // throws NotFound

account.Login( cmd.Password ) // throws InvalidPassword

The above represents just about the most declarative way to model each process. This is a good thing. It's important that your application can present a clear and concise "view" of each business process in terms of behavior. Of course, this is precisely what the application layer is for: coordinating your domain with as little logic as possible (i.e. rules) in a way to provide this view. So with the above in mind, let us move on to your specific questions.

Set validation is tricky. If there is a rule mandating that a UserName must be unique within a set of Account, it should be enforced by the set of Account. Is there a piece of your application which represents a collection of Account? Often, set validation belongs on your Repository. This is especially convenient because a Repository "knows" the single critical piece of infrastructure necessary to enforce this kind of invariant (the data store). In your case (and many others like it), I would recommend enforcing this as a constraint in your data store and let your AccountRepository throw an exception on Save if an Account with a duplicate UserName is added. Simple. Declarative.

I cannot recommend creating a separate service to check that adding an Account will succeed before it is added, because this represents a separation of data and behavior. The idea of validating that some process will succeed before it is attempted is a fundamentally procedural approach, not OOP, and certainly not DDD. This practice can also lead to all sorts of duplication and gotchas down the road when applied liberally to a system. Let the rules exits with the data, not around the data.

Moving on, we can see that your Account entity certainly requires a dependency on a hashing cohesive mechanism, as it must be able to internally hash and check hashes to allow for both the account registration and login processes. The question is whether or not this should exist as a domain service or something else. Truthfully, it doesn't really matter. The inner workings of an Entity are not important (hence the abstraction). What is important is that we focus on the behavior we would like to achieve, and let the data that supports this behavior be an implementation detail. This is the perspective that DDD seeks to provide.

That said, I understand hashing to be a cross-cutting concern. Unless your algorithm is specific to your domain, it would seem to me that hashing belongs as part of infrastructure. Don't let your domain become concerned with things that aren't truly business rules (hashing is more of a technical concern in most industries).

With regard to factories: A factory should be used when the creation of an Entity is so verbose that your model starts to lose focus. Importantly, NOT when you want to encapsulate rules. Whether or not a factory should be employed is up to you to decide.

EDIT: I'd like to touch on another idea which I kind of danced around in my answer above, but didn't explicitly state: the idea of delaying implementation until absolutely necessary. This can be an incredibly useful tactic when designing and implementing a system. In the book Clean Architecture by Robert Martin, he brings this idea up specifically in relation high-level decisions like data storage, but it is useful even in microcosms (such as modeling a workflow).

At it's core, it means to push rules (implementation) down the road as far as possible. In terms of the above system, it means enforcing a unique constraint at the lowest level it can be enforced or engaging in hashing where absolutely necessary. Simply being cognizant of this practice will lead you to naturally implement rules as close as possible to the behavior of which they govern. Of course, this is the core principal of OOP.

Best Answer

Related Solutions

Architecture – “Implementing DDD” by Vernon: value object or not

C# – DDD – Factory or Service

Related Topic