C# LINQ – Should Complex Queries Be Written?

clinq

I often find complex Linq queries that span over 10-15 lines in our code base.

I find such queries hard to understand and impossible to debug.

I've tried to split the query to smaller parts, but it's hard to do since usually inner parts depend on the outer scope variables. Think of nested loops in imperative code.

Is this an indication of my inproficiency with Linq? Are complex Linq queries justified?

Would imperative solution be better in such cases?

Below is a sample query…

 var xItems = (
            from responseData in filteredItems
            let parentLevelItems = responseData.Items.Where(adi => ids.Contains(adi.Id))
            select new
            {
                secondLevelData = 
                (from c in secondLevelLabels
                    join secondLevel in responseData.children on c
                    equals string.IsNullOrEmpty(secondLevel.Label) ? 
                                           Constants.BlankData : secondLevel.Label into coun
                    from cn in coun.DefaultIfEmpty()
                    select new
                    {
                        secondLevelLabel = c,
                        parentLevelItems = parentLevelItems,
                        dataItems = cn == null ? response : 
                                                 cn.Items.Where(adi => ids.Contains(adi.Id))
                    })
            });

Best Answer

Let's rewrite that without any LINQ or LINQ extensions:

var xItems = new List<object>();
foreach(var responseData in filteredItems)
{
    var parentLevelItems = new List<Item>();
    foreach(var adi in responseData.Items)
    {
        if(ids.Contains(adi.Id))
        {
            parentLevelItems.Add(adi);
        }
    }

    var coun = new Dictionary<Label, List<Data>>();
    foreach(var c in secondLevelLabels)
    {
        foreach(var secondLevel in responseData.Children)
        {

            if(c == string.IsNullOrEmpty(secondLevel.Label) ? 
                                           Constants.BlankData : secondLevel.Label)
            {
                if(!coun.ContainsKey(c))
                {
                   coun[c] = new List<Data>();
                }
                count[c].Add(secondLevel);
            }
        }
        var secondLevelData = new List<object>();
        foreach(var cn in coun.DefaultIfEmpty())
        {
            secondLevelData.Add(new
            {
                secondLevelLabel = c,
                parentLevelItems = parentLevelItems,
                dataItems = cn == null ? response : 
                                                 cn.Values.Where(adi => ids.Contains(adi.Id))
            });
        }
    }

    xItems.Add(new 
    {
        secondLevelData = secondLevelData;
    });
}

I took some liberties with making up type names, translating the group join, and I did in fact keep some simple LINQ extensions (Contains and DefaultIfEmpty) to avoid going too mad. Nonetheless, looking at the above, I think it's relatively clear that your root problem isn't the choice of LINQ over loops.


Actually, the right way to deal with this code is probably with the usual refactorings: rename and extract method.

Names range from terrible (cn,coun) to the pretty-bad (xItems, secondLevelItems). It's hard to say for sure without knowing the context of this code, but you should be able to find more expressive names for these. This is a lot of complex logic, it seems like what it applies to should be more specific than just an "item"!

As for extracting methods, you say:

I've tried to split the query to smaller parts, but it's hard to do since usually inner parts depend on the outer scope variables.

Well, this is where parameters come in handy! If loops are still more natural to you, you could always try translating from LINQ to loops, extracting out your methods, then translating the individual bits back to LINQ again.


More generally, you don't get any kind of automatic bonus points for using LINQ. There are two reasons it might be preferable:

  • It's more declarative than loop-writing, making your code more expressive by letting you write what you're trying to achieve rather than how you're doing it.
  • It may be easier to keep lazy evaluation.

The latter is very situation-specific, so let's focus on the former. That's really just a particular type of readability benefit. Usually it's enough, but I've definitely seen cases (often involving the Aggregate extension method) where the LINQ version is just plain harder to decipher than the loop version. In that case, go for the loops! It'd be crazy to pick the less readable option in the name of readability.

If you're worried that lack of experience with LINQ means your judgement may not be good on readability, there's really not too much you can do about that other than seek out the judgement of others and gain more experience. But if, as in your example, complexity is about the sheer length rather than too stuff much being packed into too few statements, there's a good chance that the choice of LINQ isn't ultimately what's causing the problem.

Related Topic