What should a repository really do

design-patternsobject-oriented-designrepository

I've heard a lot of the repository pattern, but I quite didn't understand what a repository should really do. When I say "what a repository should really do" I'm mainly concerned about which methods it should provide. For instance, should a repository really provide CRUD methods, or should it provide some different kind of method?

I mean, should the repositories contain business logic, or should they simply contain the logic to communicate with the data store and manage the entities to be saved or loaded?

Also I've heard that repositories are units of persistence for aggregates. But how is that? I fail to understand how this works in practice. I thought that we should have just one interface IRepository which contains the CRUD methods, and then for any entity the implementation would simply contain the logic to save and retrieve such type from the data store.

Best Answer

Well, you can see a good example in the Spring Data Framework which is based on the concept of repositories.

There you will see repositories only deal with the data store, and rarely contain any business logic (this is reserved for the service layer). So, for instance, you take a look a their design you will see they have a CRUDRepository interface which exposes methods to create, destroy and recover entities (among other things). There is also a PagingAndSortingRepository that adds extra functionality for precisely that, sorting and paging results, etc, etc.

So, this framework is perhaps a good place to study a good repository design.

As far as I know, many of the concepts implemented by the Spring Data Framework, come from a great book called Domain-Driven Design: Tackling Complexity in the Heart of Software, the book has an entire section dedicated to Repository design.

You may consider getting a copy of it.

A small excerpt from the book explains:

The REPOSITORY pattern is a simple conceptual framework to encapsulate those solutions and bring back our model focus.

A REPOSITORY represents all objects of a certain type as a conceptual set (usually emulated). It acts like a collection, except with more elaborate querying capability. Objects of the appropriate type are added and removed, and the machinery behind the REPOSITORY inserts them or deletes them from the database. This definition gathers a cohesive set of responsibilities for providing access to the roots of AGGREGATES from early life cycle through the end.

Clients request objects from the REPOSITORY using query methods that select objects based on criteria specified by the client, typically the value of certain attributes. The REPOSITORY retrieves the requested object, encapsulating the machinery of database queries and metadata mapping. REPOSITORIES can implement a variety of queries that select objects based on whatever criteria the client requires. They can also return summary information, such as a count of how many instances meet some criteria. They can even return summary calculations, such as the total across all matching objects of some numerical attribute.

A REPOSITORY lifts a huge burden from the client, which can now talk to a simple, intention-revealing interface, and ask for what it needs in terms of the model. To support all this requires a lot of complex technical infrastructure, but the interface is simple and conceptually connected to the domain model.

Therefore:

For each type of object that needs global access, create an object that can provide the illusion of an in-memory collection of all objects of that type. Set up access through a well-known global interface.

Provide methods to add and remove objects, which will encapsulate the actual insertion or removal of data in the data store. Provide methods that select objects based on some criteria and return fully instantiated objects or collections of objects whose attribute values meet the criteria, thereby encapsulating the actual storage and query technology. Provide REPOSITORIES only for AGGREGATE roots that actually need direct access. Keep the client focused on the model, delegating all object storage and access to the REPOSITORIES.