Entity Framework 6 – Should It Be Used with Repository Pattern?

cdesign-patternsentity-frameworkormrepository-pattern

So I am asking this after reading the following: Why shouldn't I use the repository pattern with Entity Framework?.

It seems there is a large split of people who say yay and those that say nay. What seems to be missing from some of the answers are concrete examples, whether that be code or good reasoning or whatever.

The issue is I keep reading people responding saying "well EF is already abstraction". Well that's great, and that's probably true, but then how would you use it without the repository pattern?

For those that say otherwise, why would you say otherwise, what personally have you run into that made is necessary?

Best Answer

To get this out of the way, I am a big proponent of Entity Framework, but it does come with some drawbacks that you need to be aware of.

I also apologize for the long answer, but this is a very hot topic with many opinions and many required considerations. For small application, a lot of these considerations don't matter, but for enterprise-grade applications they do matter a lot.

Part of what makes the EF discussion such a hot topic is that it leads to a chain of events, where each solution introduces a new problem (which sometimes only applies in more advanced cases). If I just gave you the final (or should I say current) answer, you'd think that I was omitting several other solutions, so I think it's relevant to walk you through the solutions and how they are not the final solution to the problem.


Repositories

how would you use it without the repository pattern?

The short answer to that is that (simple) repositories are an anti-pattern* to Entity Framework.

EF provides a context, which essentially provides access to the whole database. You can e.g. fire a query that returns all Country entities with their Province entities already filled in, with each province's City entities already filled in. In short, it enables to you execute multiple-entity-type queries (this is a phrase I coined myself in order to explain the difference with repositories).

Repositories, at least the basic implementation thereof, tend to take a "one entity type per repository" approach. If you want to get a list of all countries with all their provinces and all of the province's cities, you'll have to separately talk to the CountryRepository, ProviceRepository and CityRepository. In short, repositories limit you to only being able to execute single-entity-type queries. For the same example, you would have to launch 3 separate database queries in order to get all countries and their provinces and their cities.

And don't get me wrong, I like repositories. I like having the neat little boxes so you can separate your storage of different domain objects, which e.g. would allow you to get the countries from your database but the provinces from a remote API and the cities for a second remote API.

But this separation of entity types into their own private boxes very much clashes with the benefit of having relational databases, where part of the benefit is that you can launch a single query that can take related entities into account (for filtering, sorting or returning).

You might rightly respond that "a repository can still return more than one entity type". And you would be correct. But if you have a query which returns both Foo and Bar entities, where do you place it? In the FooRepository? In the BarRepository? There may be examples where the choice is easy, but there are also examples where the choice is hard and multiple developers may have different categorization methods and thus the codebase becomes inconsistent and the true purpose of the "one entity type per repository" approach will be thrown out the window.


*When I say repositories are an anti-pattern, that is not a global statement, but rather than they specifically counteract the purpose of EF. Without EF or a similar solution, repositories are not an anti-pattern.


Query objects

Query objects are the only real way to get around the "one entity type per repository" approach. The shortest way I can describe what a query object is, is that you should think of it as a "one method repository".

Repositories suffer from having to deal with multiple types of entities, and the more methods a repository has, the more distinct entity types it's likely going to be handling. By separating each repository method into a query object of its own, you've simply removed the contradictory suggestion that "this repository only handles one type", and instead are suggesting that "this query object runs this particular query, regardless of which entity types it needs to use".

You can still use repositories at the same time, and you are then able to enforce that repositories will never handle more than their designated entity type.

  • If a query makes use of more than one entity type (e.g. Country and Province), then it belongs in its own private query object (e.g. CountriesAndTheirProvincesQuery).
  • If a query only focuses on one entity type (e.g. Country), then it belongs to that entity type's repository (e.g. CountryRepository).

On a technical level, query objects work exactly like repositories do. The only difference is that you separate the logic differently by no longer trying to pretend that your multi-entity-type queries belong to a single-entity-type repository.


Repositories 2

There is a second problem pertaining to repositories. As they are separate classes, they do not depend on each other. This usually also means that each repository will use their own EF context (I'm omitting dependency injection here as it sidetracks the focus of the answer).

Suppose you are doing an import, which adds countries and cities to the database. However, you want transactional safety, meaning that when any failure is encountered, then nothing should be saved to the database.
But when you have to deal with two repositories that each have their own context, how can you knowingly call SaveChanges() on one context before knowing that the other context's SaveChanges() succeeded? You're going to have to guess, and you're going to be stuck manually undoing the first context's commit when the second context's commit ends up failing.

By separating the repositories, you've removed their ability to have a shared context, which you need in times where you're dealing with transactions that operate on more than one entity type at the same time.


Unit of work

In any sufficiently large codebase or domain where I've used repositories and EF, I've ended up implementing a unit of work to at least somewhat counter the problem of transactional safety.

Very simply put, a unit of work is a collection of all repositories, it forces the repositories to share the same context, and it allows for the developer to directly commit/rollback the context for all repositories at the same time. A simple example:

public class UnitOfWork : IDisposable
{
    public readonly FooRepository FooRepository;
    public readonly BarRepository BarRepository;
    public readonly BazRepository BazRepository;

    private readonly MyContext _context;

    public UnitOfWork()
    {
        _context = new MyContext();

        this.FooRepository = new FooRepository(_context);
        this.BarRepository = new BarRepository(_context);
        this.BazRepository = new BazRepository(_context);
    }

    public void Commit()
    {
        _context.SaveChanges();
    }

    public void Dispose()
    {
        _context.Dispose();
    }
}

And a simple usage example:

using (var uow = new UnitOfWork())
{
    uow.FooRepository.Add(myFoo);
    uow.BarRepository.Update(myBar);
    uow.BazRepository.Delete(myBaz);

    uow.Commit();
}

And now we have transactional safety. Either all three objects are handled in the database, or none of them are.


But Entity Framework is a framework! (personal note)

Maybe you've noticed, maybe you haven't, but you should see strong similarities to EF's DbContext and the UnitOfWork I just created. They are essentially the same thing. They represent a single transaction to the database, and offer access to collections of all available entity types:

public class UnitOfWork
{
    public readonly FooRepository FooRepository;
    public readonly BarRepository BarRepository;
    public readonly BazRepository BazRepository;

    public void Commit() { }
}

public class MyContext : DbContext
{
    public Set<Foo> Foos { get; private set; }
    public Set<Bar> Bars { get; private set; }
    public Set<Baz> Bazs { get; private set; }

    public int SaveChanges() { }
}

EF's DbContext satifies the definition of what a unit of work is:

A Unit of Work keeps track of everything you do during a business transaction that can affect the database. When you're done, it figures out everything that needs to be done to alter the database as a result of your work.

So why do we do this? Well, simply put, because developers always try to abstract dependencies. We don't want the business layer to directly depend on EF. This is the exact same reason why you've been creating repositories in the first place: so that your business logic doesn't directly use EF.

But what's the point of it all? Why do we use EF, then anti-patterned repositories, and then an anti-anti-patterned unit of work to make it all workable? This costs so much effort. We have to manually write search filters instead of being able to innately rely on EF's ability to parse (pretty much) any lambda method we throw at it. Why are we going through all this effort instead just to use EF in the way it's already intended to work out of the box?

And I have to admit that I've had this question for a long time but I find little support for my opinion. If you allow me to soapbox for a moment; my opinion on the matter is that this is why EF is called Entity Framework and not Entity Library.
The difference between frameworks and libraries is often semantial and up for debate, but I think an agreeable line can be drawn as explained here:

A library performs specific, well-defined operations.

A framework is a skeleton where the application defines the "meat" of the operation by filling out the skeleton. The skeleton still has code to link up the parts but the most important work is done by the application.

This description of a framework fits with EF to a tee. It pretty much does the whole DB interaction for us, but it requires us to extend DbContext with the entities (and model configuration) that we expect EF to use.

We abstract dependencies (libraries) because we can, and because the benefit of doing so (swappability) far outweighs the drawback (effort required to implement the abstraction). But frameworks, the skeleton of a system, are not easily replaced because they cannot be easily abstracted. The effort is much greater than the likelihood of needing to replace the dependency, and thus it's no longer worth the effort to do so.

I think that in order to cut out a lot of boilerplating code, it would be beneficial to consider EF as a framework that we build the application around and cannot easily move away from (the same way we can for a library). This means that we can do away with the repositories and the unit of work altogether, as their only purpose is to give access to the features EF already has; and instead use EF directly and accept that its usage is an architectural choice that we do not implement with the intention of easily moving away from it.

This means we could cut out the repositories and unit of work, and instead have our business logic deal with the context directly. Notice how the business logic code hardly changes:

// OLD

using (var uow = new UnitOfWork())
{
    uow.FooRepository.Add(myFoo);
    uow.BarRepository.Update(myBar);
    uow.BazRepository.Delete(myBaz);

    uow.Commit();
}

// NEW

using (var db = new MyContext())
{
    db.Foos.Add(myFoo);
    db.Bars.Update(myBar);
    db.Bazs.Delete(myBaz);

    db.SaveChanges();
}

The issue is I keep reading people responding saying "well EF is already abstraction". Well that's great, and that's probably true, but then how would you use it without the repository pattern?

By using EF directly and no longer trying to abstract it behind a self-developed wall of repositories (and possibly a unit of work).

For those that say otherwise, why would you say otherwise, what personally have you run into that made is necessary?

The answer is sort of a recapitulation of my experience with EF over the last 6 to 7 years. Basic repositories by themselves introduce more problems than they solve. There are advanced solutions that solve the problems introduced by basic repositories; but you do eventually reach a point where you start wondering if it's not better to simply choose to not use repositories so you don't have to spend the effort to get them to play nicely with EF.

Can they be made to play nicely with EF? Sure thing. Is it worth the effort to create all that abstraction? That very much depends on the likelihood of you moving away from EF (or using a datastore that's incompatible with EF).