Sql – Need advice about writing the own eager-load algorithm

netsql

I'm maintaining an in-house ORM written in C#, and it currently does not have any eager-loading mechanism. To improve performance, we decided that it would require eager loading so we need to write our own code to support that.
(My colleagues and I do not have any experience with any ORM tools, and furthermore, due to some legacy reasons, we are not allowed to use popular tools like LinqtoSQL, Entity Framework or Nhibernate.)

My question is, which is the accepted best practice to generate eager loading SQL statements? I have thought about it and come up with two ways –

Assuming a classic example of 4 tables –
A CustomerCategory has many Customer
A Customer has many Order
An Order has many OrderDetail

and assuming that I want to eager-load data from all 4 tables, and my condition is – where Order.OrderDate between '2008-05-05' and '2008-12-31'

Method 1 – I generate ONE sql to fetch the data from all 4 tables, all using inner joins so that I get one row for each unique combination of the primary keys of each table. I will apply my Where condition to this sql.

Method 2 – I generate an SQL to get only the order data first, and apply my Where condition to this sql, since the Order.OrderDate comes from the Order table.
Then, based on my results from this query, I will know all the Order ID values I need, so I will use these to retrieve the order detail data. I will also know all the unique Customer ID values I need, so I will also use these to retrieve data from the customer table, and finally I will do the same for the CustomerCategory. This method would require 4 SQL statements in all.

I can see that the first method is more efficient, but one of my colleagues pointed out that the 2nd method, although it uses 4 SQL statements, is easier to write and maintain, which I agree.

Any thoughts on this would be greatly appreciated.
Thank you!

Best Answer

First off, your domain model is massively wrong. I personally cannot justify a collection of Customer objects in a CustomerCategory because it just does not make sense from performance standpoint: most of the time you need a single customer (plus its group), whereas a group full of customers will be required once in a blue moon but it will be there all the time, causing all kinds of problems. The same applies to Customer having many Orders.

Now, to your question. It's generally considered that number of roundtrips to the database should be minimized, even at the cost of retrieving more data than necessary. That said, joining two big tables (long & wide) to select data from two associated tables simultaneously might be a performance killer, so beware.

I recommend you looking at how it's done in NHibernate. It allows you to specify fetching strategy (join, select) for each and every association, be it a one-to-one association, or one-to-many.

If you're using Microsoft SQL Server 2005 or later, you can use MARS to stuff several selects into one batch and then hydrate a whole graph of objects issuing only one SQL command.