Entity Framework – Storing Temporary Data Outside DbContext

cdata structuresdesign-patternsentity-frameworknet

I have a performance issue with a few methods in my Service Layer. Methods that are causing me troubles have some logic, but it's db connection that takes over 95% exec time. The problem is unfortunately hardware, the machine is just slow. But, let me explain the software problem first:

The scenario goes like this (simplified):
Company sells cars, there's Car entity and PurchaseOrder entity. In the application, there is a Car view with all PurchaseOrders. There is a function to determine PurchaseOrdersStatus. It wouldn't be slow, but it must query the "slow" database a few times. Queries are very simple, like:

"Check if FooId exists in bar table".

If the car has a few purchase orders – the query is veeery fast. But when the car has hundreds of orders – it's getting very slow. Cause this query must be performed many many times. I came up with the solution:

  1. Query all FooIds from bar table
  2. Store this list of ids in the memory
  3. In those hundreds of calls I'm checking if the id exists in the memory, which makes it way faster.

My question is, is there a pattern of "prequering" data from database, putting it into memory?

In my data persistence classes, like CarData I literally have "Initialization" method and member lists with ids. But I'm not sure if this is the right way to do it.

EDIT:

I'm doing one big query in the fast database, it's really fast even for a lot of records. To get "the status" I need to do a few more queries, that are in tables totally not connected to each other, some of which are in very slow db (btw. The db architecture is really terrible, but it's a legacy db, so all I can do is complain). It's possible to do the cross-database one query (built of subqueries, like ... WHERE carId IN (select cardId from ... – but it's really slow.

EDIT2: My proposed solution

In my project I have a few FooData classes, some of them have corresponding FooCache class. Cache looks similar to Data, it has DataContext, Collections (cached list of ids or whatever I need) and one method Initialize. I inject FooCache into FooData via IoC (so every time FooData is created), BUT I don't initialize it. Only in methods where I need caching I call FooCache.Initialize(); at the beginning. FooData before going to the database, checks if FooCache is initialized. If it isn't, it goes to the db.

Best Answer

is there a pattern of "prequering" data from database, putting it into memory?

Yes, it's commonly called "caching" and it's used for this purpose.

when the car has hundreds of orders ... this query must be performed many many times.

You are running into possibly the most common performance problem encountered by programmers using an Object-Relational Mapper like Entity Framework, called SELECT N+1. In short: this is when the number of queries your application executes grows linearly relative to the number of records in your database (or some subset of it).

The best way to avoid SELECT N+1 problems is to produce better SQL, either by directly writing a view or stored procedure in the database, or by changing how you're using the ORM so it generates better SQL.

In Entity Framework, navigation properties make it difficult to fully appreciate when the code is hitting the database. My personal preference is to never use navigation properties in Entity Framework. They introduce loads of issues and this is one of them. I want my ORM to be a way to think in SQL while writing in C#, but navigation properties seem specifically designed to hide away all that "scary" SQL and keep you thinking in C#, which makes it very hard to contain problems like SELECT N+1. Instead of using navigation properties, compose your IQueryable objects as needed in LINQ.

If you are committed to navigation properties, then your next best bet is to use Include to eagerly load related entities into properties you know you're about to traverse, as explained on MSDN.

What you are proposing is in-memory caching, which may or may not be a great solution depending on your constraints. The main issues to consider with caching are:

  1. Keeping the data fresh. Make sure your cache is expiring more frequently than your data is changing, or you can end up using outdated data. If you're caching in preparation for one data processing cycle and then throwing the cache away, this is no problem. If your data is unlikely to change during application runtime (e.g. unit conversion rules), you can get away with a long expiry. If this is primary data that your users are continually editing, this is probably not the right solution.

  2. Not querying more than you need. It's generally very hard to restrict the query that populates the cache to only the data you need, without directly tying it into the primary query (and then why not solve the problem with LINQ?), especially if you're trying to re-use the cache. If the amount of possible data you'll ever get at once is reasonably small compared to your memory, that is no problem, but make sure you aren't just guessing about that.

As long as you've considered the above points, using a cache is certainly better than tolerating SELECT N+1s. As common as they are, I've never seen a situation where a SELECT N+1 couldn't be refactored into some constant number of queries relative to the number of records in the database.

Related Topic