.net – Why is LINQ JOIN so much faster than linking with WHERE

join;linqlinq-to-datasetnetperformance

I've recently upgraded to VS 2010 and am playing around with LINQ to Dataset. I have a strong typed dataset for Authorization that is in HttpCache of an ASP.NET WebApplication.

So i wanted to know what actually is the fastest way to check if a user is authorized to do something. Here is my datamodel and some other informations if somebody is interested.

I have checked 3 ways:

  1. direct database
  2. LINQ query with Where conditions as "Join" – Syntax
  3. LINQ query with Join – Syntax

These are the results with 1000 calls on each function:

1.Iteration:

  1. 4,2841519 sec.
  2. 115,7796925 sec.
  3. 2,024749 sec.

2.Iteration:

  1. 3,1954857 sec.
  2. 84,97047 sec.
  3. 1,5783397 sec.

3.Iteration:

  1. 2,7922143 sec.
  2. 97,8713267 sec.
  3. 1,8432163 sec.

Average:

  1. Database: 3,4239506333 sec.
  2. Where: 99,5404964 sec.
  3. Join: 1,815435 sec.

Why is the Join-version so much faster than the where-syntax which makes it useless although as a LINQ newbie it seems to be the most legible. Or have i missed something in my queries?

Here are the LINQ queries, i skip the database:

Where:

Public Function hasAccessDS_Where(ByVal accessRule As String) As Boolean
    Dim userID As Guid = DirectCast(Membership.GetUser.ProviderUserKey, Guid)
    Dim query = From accRule In Authorization.dsAuth.aspnet_AccessRule, _
                roleAccRule In Authorization.dsAuth.aspnet_RoleAccessRule, _
                role In Authorization.dsAuth.aspnet_Roles, _
                userRole In Authorization.dsAuth.aspnet_UsersInRoles _
                Where accRule.idAccessRule = roleAccRule.fiAccessRule _
                And roleAccRule.fiRole = role.RoleId _
                And userRole.RoleId = role.RoleId _
                And userRole.UserId = userID And accRule.RuleName.Contains(accessRule)
                Select accRule.idAccessRule
    Return query.Any
End Function

Join:

Public Function hasAccessDS_Join(ByVal accessRule As String) As Boolean
    Dim userID As Guid = DirectCast(Membership.GetUser.ProviderUserKey, Guid)
    Dim query = From accRule In Authorization.dsAuth.aspnet_AccessRule _
                Join roleAccRule In Authorization.dsAuth.aspnet_RoleAccessRule _
                On accRule.idAccessRule Equals roleAccRule.fiAccessRule _
                Join role In Authorization.dsAuth.aspnet_Roles _
                On role.RoleId Equals roleAccRule.fiRole _
                Join userRole In Authorization.dsAuth.aspnet_UsersInRoles _
                On userRole.RoleId Equals role.RoleId _
                Where userRole.UserId = userID And accRule.RuleName.Contains(accessRule)
                Select accRule.idAccessRule
    Return query.Any
End Function

Thank you in advance.


Edit: after some improvements on both queries to get more meaningful perfomance-values, the advantage of the JOIN is even many times greater than before:

Join:

Public Overloads Shared Function hasAccessDS_Join(ByVal userID As Guid, ByVal idAccessRule As Int32) As Boolean
    Dim query = From accRule In Authorization.dsAuth.aspnet_AccessRule _
                   Join roleAccRule In Authorization.dsAuth.aspnet_RoleAccessRule _
                   On accRule.idAccessRule Equals roleAccRule.fiAccessRule _
                   Join role In Authorization.dsAuth.aspnet_Roles _
                   On role.RoleId Equals roleAccRule.fiRole _
                   Join userRole In Authorization.dsAuth.aspnet_UsersInRoles _
                   On userRole.RoleId Equals role.RoleId _
                   Where accRule.idAccessRule = idAccessRule And userRole.UserId = userID
             Select role.RoleId
    Return query.Any
End Function

Where:

Public Overloads Shared Function hasAccessDS_Where(ByVal userID As Guid, ByVal idAccessRule As Int32) As Boolean
    Dim query = From accRule In Authorization.dsAuth.aspnet_AccessRule, _
           roleAccRule In Authorization.dsAuth.aspnet_RoleAccessRule, _
           role In Authorization.dsAuth.aspnet_Roles, _
           userRole In Authorization.dsAuth.aspnet_UsersInRoles _
           Where accRule.idAccessRule = roleAccRule.fiAccessRule _
           And roleAccRule.fiRole = role.RoleId _
           And userRole.RoleId = role.RoleId _
           And accRule.idAccessRule = idAccessRule And userRole.UserId = userID
           Select role.RoleId
    Return query.Any
End Function

Result for 1000 calls (on a faster computer)

  1. Join | 2. Where

1.Iteration:

  1. 0,0713669 sec.
  2. 12,7395299 sec.

2.Iteration:

  1. 0,0492458 sec.
  2. 12,3885925 sec.

3.Iteration:

  1. 0,0501982 sec.
  2. 13,3474216 sec.

Average:

  1. Join: 0,0569367 sec.
  2. Where: 12,8251813 sec.

Join is 225 times faster

Conclusion: avoid WHERE to specify relations and use JOIN whenever possible(definitely in LINQ to DataSet and Linq-To-Objects in general).

Best Answer

  1. Your first approach (SQL query in the DB) is quite efficient because the DB knows how to perform a join. But it doesn't really make sense to compare it with the other approaches, since they work directly in memory (Linq to DataSet)

  2. The query with multiple tables and a Where condition actually performs a cartesian product of all the tables, then filters the rows that satisfy the condition. This means the Where condition is evaluated for each combination of rows (n1 * n2 * n3 * n4)

  3. The Join operator takes the rows from the first tables, then takes only the rows with a matching key from the second table, then only the rows with a matching key from the third table, and so on. This is much more efficient, because it doesn't need to perform as many operations