Personally, I've tried making one huge schema for all my entities on a fairly complex but small project(~300 tables) . We had an extremely normalized database (5th form normalization (I say that loosely)) with many "many to many" relationships and extreme referential integrity enforcement.
We also used a "single instance per request" strategy which I'm not convinced helped either.
When doing simple, reasonably flat "explicitly defined" listings, lookups and saves the performance was generally acceptable. But when we started digging into deep relationships the performance seemed to take drastic dips. Compared to a stored proc in this instance, there was no comparison (of course). I'm sure we could've tweaked the code base here and there to get the performance improved, however, in this case we just needed performance boost without analysis due to time constraints, and we fell back to the stored proc (still mapped it through EF, because EF provided strongly typed results), we only needed that as a fall back in a few area's. When we had to traverse all over the database to create a collection (using .include() unsparingly), the performance was noticeably degrading, but maybe we were asking too much..
So based on my experience, i would recommend creating a separate .edmx per intent. Only generate what you'll be using based on the scope of that need. You may have some smaller scoped .edmx files for purposed tasks, and then some large ones where you need to traverse complex relationships to build objects. I'm not sure where that magic spot is, but I'm sure there is one... lol...
Honestly though, aside from a few pitfalls which we kind of saw coming (complex traversing), the huge .edmx worked fine from a "working" perspective. But you'll have to watch out for the "fixup" magic that the context does behind the scene's if you don't explicitly disable it. As well as keeping the .edmx in sync when changes to the database are made.. it was sometimes easier to wipe the entire surface and re-create the entities, which took like 3 minutes so it wasn't a big deal.
This was all with EntityFramework 4.1. I'd be really interested in hearing about your end choice and experience as well.
And regarding you're question on nHibernate, that's a can of worms question in my opinion, you'll get barking on both sides of the fence... I hear a lot of people bashing EF for the sake of bashing without working through the challenges and understanding the nuances unique to EF itself.. and although I've never used nHibernate in production, generally, if you have to manually and explicitly create things like mappings, you're going to get more finite control, however, if you can drag n' drop , generate, and start CRUD'ing and querying using LINQ, I could give a crap about granularity.
I hope this helps.
Design patterns like the Repository pattern aren't dependent on particular technology, so the obvious answer to your question is "yes, of course you can". You can write repositories which are ultimately backed by any storage technology - the point is that the Repository pattern is a way of structuring your data access code and separating it from your other code. It's described in such a way as to be independent of programming language - provided the language in question has objects and interface inheritance you can make it work.
What it isn't is a way of structuring your business logic and separating it from your view controllers, which is what your explanation actually says you need. For that you've got a number of other options available. Where I work we write a set of Service objects which provide business logic, and the controllers call those (they in turn have Repository objects injected by the DI framework to allow them to access data storage without having to know all the details). Controllers then only contain code which deals with processing incoming data from the UI into a form the services can handle, and turning service responses into something suitable for display to the user.
Best Answer
To get this out of the way, I am a big proponent of Entity Framework, but it does come with some drawbacks that you need to be aware of.
I also apologize for the long answer, but this is a very hot topic with many opinions and many required considerations. For small application, a lot of these considerations don't matter, but for enterprise-grade applications they do matter a lot.
Part of what makes the EF discussion such a hot topic is that it leads to a chain of events, where each solution introduces a new problem (which sometimes only applies in more advanced cases). If I just gave you the final (or should I say current) answer, you'd think that I was omitting several other solutions, so I think it's relevant to walk you through the solutions and how they are not the final solution to the problem.
Repositories
The short answer to that is that (simple) repositories are an anti-pattern* to Entity Framework.
EF provides a context, which essentially provides access to the whole database. You can e.g. fire a query that returns all Country entities with their Province entities already filled in, with each province's City entities already filled in. In short, it enables to you execute multiple-entity-type queries (this is a phrase I coined myself in order to explain the difference with repositories).
Repositories, at least the basic implementation thereof, tend to take a "one entity type per repository" approach. If you want to get a list of all countries with all their provinces and all of the province's cities, you'll have to separately talk to the
CountryRepository
,ProviceRepository
andCityRepository
. In short, repositories limit you to only being able to execute single-entity-type queries. For the same example, you would have to launch 3 separate database queries in order to get all countries and their provinces and their cities.And don't get me wrong, I like repositories. I like having the neat little boxes so you can separate your storage of different domain objects, which e.g. would allow you to get the countries from your database but the provinces from a remote API and the cities for a second remote API.
But this separation of entity types into their own private boxes very much clashes with the benefit of having relational databases, where part of the benefit is that you can launch a single query that can take related entities into account (for filtering, sorting or returning).
You might rightly respond that "a repository can still return more than one entity type". And you would be correct. But if you have a query which returns both
Foo
andBar
entities, where do you place it? In theFooRepository
? In theBarRepository
? There may be examples where the choice is easy, but there are also examples where the choice is hard and multiple developers may have different categorization methods and thus the codebase becomes inconsistent and the true purpose of the "one entity type per repository" approach will be thrown out the window.*When I say repositories are an anti-pattern, that is not a global statement, but rather than they specifically counteract the purpose of EF. Without EF or a similar solution, repositories are not an anti-pattern.
Query objects
Query objects are the only real way to get around the "one entity type per repository" approach. The shortest way I can describe what a query object is, is that you should think of it as a "one method repository".
Repositories suffer from having to deal with multiple types of entities, and the more methods a repository has, the more distinct entity types it's likely going to be handling. By separating each repository method into a query object of its own, you've simply removed the contradictory suggestion that "this repository only handles one type", and instead are suggesting that "this query object runs this particular query, regardless of which entity types it needs to use".
You can still use repositories at the same time, and you are then able to enforce that repositories will never handle more than their designated entity type.
Country
andProvince
), then it belongs in its own private query object (e.g.CountriesAndTheirProvincesQuery
).Country
), then it belongs to that entity type's repository (e.g.CountryRepository
).On a technical level, query objects work exactly like repositories do. The only difference is that you separate the logic differently by no longer trying to pretend that your multi-entity-type queries belong to a single-entity-type repository.
Repositories 2
There is a second problem pertaining to repositories. As they are separate classes, they do not depend on each other. This usually also means that each repository will use their own EF context (I'm omitting dependency injection here as it sidetracks the focus of the answer).
Suppose you are doing an import, which adds countries and cities to the database. However, you want transactional safety, meaning that when any failure is encountered, then nothing should be saved to the database.
But when you have to deal with two repositories that each have their own context, how can you knowingly call
SaveChanges()
on one context before knowing that the other context'sSaveChanges()
succeeded? You're going to have to guess, and you're going to be stuck manually undoing the first context's commit when the second context's commit ends up failing.By separating the repositories, you've removed their ability to have a shared context, which you need in times where you're dealing with transactions that operate on more than one entity type at the same time.
Unit of work
In any sufficiently large codebase or domain where I've used repositories and EF, I've ended up implementing a unit of work to at least somewhat counter the problem of transactional safety.
Very simply put, a unit of work is a collection of all repositories, it forces the repositories to share the same context, and it allows for the developer to directly commit/rollback the context for all repositories at the same time. A simple example:
And a simple usage example:
And now we have transactional safety. Either all three objects are handled in the database, or none of them are.
But Entity Framework is a framework! (personal note)
Maybe you've noticed, maybe you haven't, but you should see strong similarities to EF's
DbContext
and theUnitOfWork
I just created. They are essentially the same thing. They represent a single transaction to the database, and offer access to collections of all available entity types:EF's
DbContext
satifies the definition of what a unit of work is:So why do we do this? Well, simply put, because developers always try to abstract dependencies. We don't want the business layer to directly depend on EF. This is the exact same reason why you've been creating repositories in the first place: so that your business logic doesn't directly use EF.
But what's the point of it all? Why do we use EF, then anti-patterned repositories, and then an anti-anti-patterned unit of work to make it all workable? This costs so much effort. We have to manually write search filters instead of being able to innately rely on EF's ability to parse (pretty much) any lambda method we throw at it. Why are we going through all this effort instead just to use EF in the way it's already intended to work out of the box?
And I have to admit that I've had this question for a long time but I find little support for my opinion. If you allow me to soapbox for a moment; my opinion on the matter is that this is why EF is called Entity Framework and not Entity Library.
The difference between frameworks and libraries is often semantial and up for debate, but I think an agreeable line can be drawn as explained here:
This description of a framework fits with EF to a tee. It pretty much does the whole DB interaction for us, but it requires us to extend
DbContext
with the entities (and model configuration) that we expect EF to use.We abstract dependencies (libraries) because we can, and because the benefit of doing so (swappability) far outweighs the drawback (effort required to implement the abstraction). But frameworks, the skeleton of a system, are not easily replaced because they cannot be easily abstracted. The effort is much greater than the likelihood of needing to replace the dependency, and thus it's no longer worth the effort to do so.
I think that in order to cut out a lot of boilerplating code, it would be beneficial to consider EF as a framework that we build the application around and cannot easily move away from (the same way we can for a library). This means that we can do away with the repositories and the unit of work altogether, as their only purpose is to give access to the features EF already has; and instead use EF directly and accept that its usage is an architectural choice that we do not implement with the intention of easily moving away from it.
This means we could cut out the repositories and unit of work, and instead have our business logic deal with the context directly. Notice how the business logic code hardly changes:
By using EF directly and no longer trying to abstract it behind a self-developed wall of repositories (and possibly a unit of work).
The answer is sort of a recapitulation of my experience with EF over the last 6 to 7 years. Basic repositories by themselves introduce more problems than they solve. There are advanced solutions that solve the problems introduced by basic repositories; but you do eventually reach a point where you start wondering if it's not better to simply choose to not use repositories so you don't have to spend the effort to get them to play nicely with EF.
Can they be made to play nicely with EF? Sure thing. Is it worth the effort to create all that abstraction? That very much depends on the likelihood of you moving away from EF (or using a datastore that's incompatible with EF).