Entity Framework – How to Divide Models in Large Systems

enterprise-architectureentity-frameworkorm

I'm working with a SQL Server database with 1000+ tables, another few hundred views, and several thousand stored procedures. We are looking to start using Entity Framework for our newer projects, and we are working on our strategy for doing so. The thing I'm hung up on is how best to split the tables into different models (EDMX or DbContext if we go code first). I can think of a few strategies right off the bat:

  • Split by schema
    We have our tables split across probably a dozen schemas. We could do one model per schema. This isn't perfect, though, because dbo still ends up being very large, with 500+ tables / views. Another problem is that certain units of work will end up having to do transactions that span multiple models, which adds to complexity, although I assume EF makes this fairly straightforward.
  • Split by intent
    Instead of worrying about schemas, split the models by intent. So we'll have different models for each application, or project, or module, or screen, depending on how granular we want to get. The problem I see with this is that there are certain tables that inevitably have to be used in every case, such as User or AuditHistory. Do we add those to every model (violates DRY I think), or are those in a separate model that is used by every project?
  • Don't split at all – one giant model
    This is obviously simple from a development perspective but from my research and my intuition this seems like it could perform terribly, both at design time, compile time, and possibly run time.

What is the best practice for using EF against such a large database? Specifically what strategies do people use in designing models against this volume of DB objects? Are there options that I'm not thinking of that work better than what I have above?

Also, is this a problem in other ORMs such as NHibernate? If so have they come up with any better solutions than EF?

Best Answer

Personally, I've tried making one huge schema for all my entities on a fairly complex but small project(~300 tables) . We had an extremely normalized database (5th form normalization (I say that loosely)) with many "many to many" relationships and extreme referential integrity enforcement.

We also used a "single instance per request" strategy which I'm not convinced helped either.

When doing simple, reasonably flat "explicitly defined" listings, lookups and saves the performance was generally acceptable. But when we started digging into deep relationships the performance seemed to take drastic dips. Compared to a stored proc in this instance, there was no comparison (of course). I'm sure we could've tweaked the code base here and there to get the performance improved, however, in this case we just needed performance boost without analysis due to time constraints, and we fell back to the stored proc (still mapped it through EF, because EF provided strongly typed results), we only needed that as a fall back in a few area's. When we had to traverse all over the database to create a collection (using .include() unsparingly), the performance was noticeably degrading, but maybe we were asking too much..

So based on my experience, i would recommend creating a separate .edmx per intent. Only generate what you'll be using based on the scope of that need. You may have some smaller scoped .edmx files for purposed tasks, and then some large ones where you need to traverse complex relationships to build objects. I'm not sure where that magic spot is, but I'm sure there is one... lol...

Honestly though, aside from a few pitfalls which we kind of saw coming (complex traversing), the huge .edmx worked fine from a "working" perspective. But you'll have to watch out for the "fixup" magic that the context does behind the scene's if you don't explicitly disable it. As well as keeping the .edmx in sync when changes to the database are made.. it was sometimes easier to wipe the entire surface and re-create the entities, which took like 3 minutes so it wasn't a big deal.

This was all with EntityFramework 4.1. I'd be really interested in hearing about your end choice and experience as well.

And regarding you're question on nHibernate, that's a can of worms question in my opinion, you'll get barking on both sides of the fence... I hear a lot of people bashing EF for the sake of bashing without working through the challenges and understanding the nuances unique to EF itself.. and although I've never used nHibernate in production, generally, if you have to manually and explicitly create things like mappings, you're going to get more finite control, however, if you can drag n' drop , generate, and start CRUD'ing and querying using LINQ, I could give a crap about granularity.

I hope this helps.