Sql – Need advice about writing the own eager-load algorithm

netsql

I'm maintaining an in-house ORM written in C#, and it currently does not have any eager-loading mechanism. To improve performance, we decided that it would require eager loading so we need to write our own code to support that.
(My colleagues and I do not have any experience with any ORM tools, and furthermore, due to some legacy reasons, we are not allowed to use popular tools like LinqtoSQL, Entity Framework or Nhibernate.)

My question is, which is the accepted best practice to generate eager loading SQL statements? I have thought about it and come up with two ways –

Assuming a classic example of 4 tables –
A CustomerCategory has many Customer
A Customer has many Order
An Order has many OrderDetail

and assuming that I want to eager-load data from all 4 tables, and my condition is – where Order.OrderDate between '2008-05-05' and '2008-12-31'

Method 1 – I generate ONE sql to fetch the data from all 4 tables, all using inner joins so that I get one row for each unique combination of the primary keys of each table. I will apply my Where condition to this sql.

Method 2 – I generate an SQL to get only the order data first, and apply my Where condition to this sql, since the Order.OrderDate comes from the Order table.
Then, based on my results from this query, I will know all the Order ID values I need, so I will use these to retrieve the order detail data. I will also know all the unique Customer ID values I need, so I will also use these to retrieve data from the customer table, and finally I will do the same for the CustomerCategory. This method would require 4 SQL statements in all.

I can see that the first method is more efficient, but one of my colleagues pointed out that the 2nd method, although it uses 4 SQL statements, is easier to write and maintain, which I agree.

Any thoughts on this would be greatly appreciated.
Thank you!

Best Answer

First off, your domain model is massively wrong. I personally cannot justify a collection of Customer objects in a CustomerCategory because it just does not make sense from performance standpoint: most of the time you need a single customer (plus its group), whereas a group full of customers will be required once in a blue moon but it will be there all the time, causing all kinds of problems. The same applies to Customer having many Orders.

Now, to your question. It's generally considered that number of roundtrips to the database should be minimized, even at the cost of retrieving more data than necessary. That said, joining two big tables (long & wide) to select data from two associated tables simultaneously might be a performance killer, so beware.

I recommend you looking at how it's done in NHibernate. It allows you to specify fetching strategy (join, select) for each and every association, be it a one-to-one association, or one-to-many.

If you're using Microsoft SQL Server 2005 or later, you can use MARS to stuff several selects into one batch and then hydrate a whole graph of objects issuing only one SQL command.

Related Solutions

Sql – How to (or can I) SELECT DISTINCT on multiple columns

SELECT DISTINCT a,b,c FROM t

is roughly equivalent to:

SELECT a,b,c FROM t GROUP BY a,b,c

It's a good idea to get used to the GROUP BY syntax, as it's more powerful.

For your query, I'd do it like this:

UPDATE sales
SET status='ACTIVE'
WHERE id IN
(
    SELECT id
    FROM sales S
    INNER JOIN
    (
        SELECT saleprice, saledate
        FROM sales
        GROUP BY saleprice, saledate
        HAVING COUNT(*) = 1 
    ) T
    ON S.saleprice=T.saleprice AND s.saledate=T.saledate
 )

SQLite – UPSERT not INSERT or REPLACE

Assuming three columns in the table: ID, NAME, ROLE

BAD: This will insert or replace all columns with new values for ID=1:

INSERT OR REPLACE INTO Employee (id, name, role) 
  VALUES (1, 'John Foo', 'CEO');

BAD: This will insert or replace 2 of the columns... the NAME column will be set to NULL or the default value:

INSERT OR REPLACE INTO Employee (id, role) 
  VALUES (1, 'code monkey');

GOOD: Use SQLite On conflict clause UPSERT support in SQLite! UPSERT syntax was added to SQLite with version 3.24.0!

UPSERT is a special syntax addition to INSERT that causes the INSERT to behave as an UPDATE or a no-op if the INSERT would violate a uniqueness constraint. UPSERT is not standard SQL. UPSERT in SQLite follows the syntax established by PostgreSQL.

GOOD but tedious: This will update 2 of the columns. When ID=1 exists, the NAME will be unaffected. When ID=1 does not exist, the name will be the default (NULL).

INSERT OR REPLACE INTO Employee (id, role, name) 
  VALUES (  1, 
            'code monkey',
            (SELECT name FROM Employee WHERE id = 1)
          );

This will update 2 of the columns. When ID=1 exists, the ROLE will be unaffected. When ID=1 does not exist, the role will be set to 'Benchwarmer' instead of the default value.

INSERT OR REPLACE INTO Employee (id, name, role) 
  VALUES (  1, 
            'Susan Bar',
            COALESCE((SELECT role FROM Employee WHERE id = 1), 'Benchwarmer')
          );

Best Answer

Related Solutions

Sql – How to (or can I) SELECT DISTINCT on multiple columns

SQLite – UPSERT *not* INSERT or REPLACE

Related Topic

SQLite – UPSERT not INSERT or REPLACE