C# LINQ – Should Complex Queries Be Written?

clinq

I often find complex Linq queries that span over 10-15 lines in our code base.

I find such queries hard to understand and impossible to debug.

I've tried to split the query to smaller parts, but it's hard to do since usually inner parts depend on the outer scope variables. Think of nested loops in imperative code.

Is this an indication of my inproficiency with Linq? Are complex Linq queries justified?

Would imperative solution be better in such cases?

Below is a sample query…

 var xItems = (
            from responseData in filteredItems
            let parentLevelItems = responseData.Items.Where(adi => ids.Contains(adi.Id))
            select new
            {
                secondLevelData = 
                (from c in secondLevelLabels
                    join secondLevel in responseData.children on c
                    equals string.IsNullOrEmpty(secondLevel.Label) ? 
                                           Constants.BlankData : secondLevel.Label into coun
                    from cn in coun.DefaultIfEmpty()
                    select new
                    {
                        secondLevelLabel = c,
                        parentLevelItems = parentLevelItems,
                        dataItems = cn == null ? response : 
                                                 cn.Items.Where(adi => ids.Contains(adi.Id))
                    })
            });

Best Answer

Let's rewrite that without any LINQ or LINQ extensions:

var xItems = new List<object>();
foreach(var responseData in filteredItems)
{
    var parentLevelItems = new List<Item>();
    foreach(var adi in responseData.Items)
    {
        if(ids.Contains(adi.Id))
        {
            parentLevelItems.Add(adi);
        }
    }

    var coun = new Dictionary<Label, List<Data>>();
    foreach(var c in secondLevelLabels)
    {
        foreach(var secondLevel in responseData.Children)
        {

            if(c == string.IsNullOrEmpty(secondLevel.Label) ? 
                                           Constants.BlankData : secondLevel.Label)
            {
                if(!coun.ContainsKey(c))
                {
                   coun[c] = new List<Data>();
                }
                count[c].Add(secondLevel);
            }
        }
        var secondLevelData = new List<object>();
        foreach(var cn in coun.DefaultIfEmpty())
        {
            secondLevelData.Add(new
            {
                secondLevelLabel = c,
                parentLevelItems = parentLevelItems,
                dataItems = cn == null ? response : 
                                                 cn.Values.Where(adi => ids.Contains(adi.Id))
            });
        }
    }

    xItems.Add(new 
    {
        secondLevelData = secondLevelData;
    });
}

I took some liberties with making up type names, translating the group join, and I did in fact keep some simple LINQ extensions (Contains and DefaultIfEmpty) to avoid going too mad. Nonetheless, looking at the above, I think it's relatively clear that your root problem isn't the choice of LINQ over loops.

Actually, the right way to deal with this code is probably with the usual refactorings: rename and extract method.

Names range from terrible (cn,coun) to the pretty-bad (xItems, secondLevelItems). It's hard to say for sure without knowing the context of this code, but you should be able to find more expressive names for these. This is a lot of complex logic, it seems like what it applies to should be more specific than just an "item"!

As for extracting methods, you say:

I've tried to split the query to smaller parts, but it's hard to do since usually inner parts depend on the outer scope variables.

Well, this is where parameters come in handy! If loops are still more natural to you, you could always try translating from LINQ to loops, extracting out your methods, then translating the individual bits back to LINQ again.

More generally, you don't get any kind of automatic bonus points for using LINQ. There are two reasons it might be preferable:

It's more declarative than loop-writing, making your code more expressive by letting you write what you're trying to achieve rather than how you're doing it.
It may be easier to keep lazy evaluation.

The latter is very situation-specific, so let's focus on the former. That's really just a particular type of readability benefit. Usually it's enough, but I've definitely seen cases (often involving the Aggregate extension method) where the LINQ version is just plain harder to decipher than the loop version. In that case, go for the loops! It'd be crazy to pick the less readable option in the name of readability.

If you're worried that lack of experience with LINQ means your judgement may not be good on readability, there's really not too much you can do about that other than seek out the judgement of others and gain more experience. But if, as in your example, complexity is about the sheer length rather than too stuff much being packed into too few statements, there's a good chance that the choice of LINQ isn't ultimately what's causing the problem.

Related Solutions

C# – the point of properties

Why is it "methods against properties"? A method does something. A property is, well, a member of an object. They're totally different things, although two kinds of methods commonly written - getters and setters - correspond to properties. Since comparing properties with methods in general is not meaningful, I'll assume you meant to talk about getters/setters.

Getter and setter method pairs are a code smell.

Not necessarily, I'd argue, but either way this isn't a matter of getters/setters vs. properties but a matter of whether either should be used. Also note that you can e.g. leave out the setX part just like you can make read-only properties.

What is important to a consumer of an object is what it does, not how it does it. Its behavior is what it does; its state is how it does it. If you find yourself caring about an object's state (except for persistence, though this also breaks OO), you're simply not doing OOP and losing out on its advantages.

A highly questionable attitude. If the GUI wants to output data held by a DataStuffManagerImpl, it needs to get that number from it somehow (and no, cramming half of the application into the widget class is not an option).

As the property gets more and more complex, the performance indication of using a property seems less and less truthful.

[Methods have] No performance guarantees. The API will remain truthful even if the method gets more complex. The consumer will need to profile their code if they are running in to performance issues, and not rely on word-of-API.

In almost all cases, all the validation logic etc. is still effectively O(1), or otherwise negible in cost. If not, perhaps you've gone too far and it's time for a change. And a getX() method is usually treated as cheap as well!

Sure, typing myObject.Length is easier than typing myObject.Length(), but couldn't that be fixed with a little syntactic sugar?

Properties are that syntactic sugar.

Less for the consumer to think about. Does this property have a setter? A method sure doesn't.

"Is there a setter for this getter?" Same difference.

The programmer of the API will think more deeply about methods with return values, and avoid modifying the object's state in such methods, if possible. Separation of commands from queries should be enforced wherever possible.

I'm not sure what you're trying to tell us here. For properties, there's an even stronger unwritten law not to cause side effects.

JavaScript Modularity – Server-Based MVC and Business Reality

TerryR my friend, you and I should have a drink. We have some similar problems.

1. Project Structure: I agree with Eduardo that the folder structure in an MVC app leaves something to be desired. You have your standard Controllers, Models, and Views folders. But then the Views folder gets broken down into a different folder for each Controller, plus a Shared folder. And each Views/ControllerName or Views/Shared can be broken down into EditorTemplates and DisplayTemplates. But it lets you decide how to organize your Models folder (you can do with or without subfolders & additional namespace declarations).

God forbid you are using Areas, which duplicate the Controllers, Models, and Views folder structure for each area.

/Areas
    /Area1Name
        /Controllers
            FirstController.cs
            SecondController.cs
            ThirdController.cs
        /Models
            (can organize all in here or in separate folders / namespaces)
        /Views
            /First
                /DisplayTemplates
                    WidgetAbc.cshtml <-- to be used by views in Views/First
                /EditorTemplates
                    WidgetAbc.cshtml <-- to be used by views in Views/First
                PartialViewAbc.cshtml <-- to be used by FirstController
            /Second
                PartialViewDef.cshtml <-- to be used by SecondController
            /Third
                PartialViewMno.cshtml <-- to be used by ThirdController
            /Shared
                /DisplayTemplates
                    WidgetXyz.cshtml <-- to be used by any view in Area1
                /EditorTemplates
                    WidgetXyz.cshtml <-- to be used by any view in Area1
                PartialViewXyz.cshtml <-- to be used anywhere in Area1
            _ViewStart.cshtml <-- area needs its own _ViewStart.cshtml
            Web.config <-- put custom HTML Helper namespaces in here
        Area1NameRegistration.cs <-- define routes for area1 here
    /Area2Name
        /Controllers
        /Models
        /Views
        Area2NameRegistration.cs <-- define routes for area2 here

/Controllers
    AccountController.cs
    HomeController.cs
/Models
/Views
    /Account
        /DisplayTemplates
            WidgetGhi.cshtml <-- to be used views in Views/Account
        /EditorTemplates
            WidgetGhi.cshtml <-- to be used views in Views/Account
        PartialViewGhi.cshtml <-- to be used by AccountController
    /Home
        (same pattern as Account, views & templates are controller-specific)
    /Shared
        /DisplayTemplates 
            EmailAddress.cshtml <-- to be used by any view in any area
            Time.cshtml <-- to be used by any view in any area
            Url.cshtml <-- to be used by any view in any area
        /EditorTemplates
            EmailAddress.cshtml <-- to be used by any view in any area
            Time.cshtml <-- to be used by any view in any area
            Url.cshtml <-- to be used by any view in any area
        _Layout.cshtml <-- master layout page with sections
        Error.cshtml <-- custom page to show if unhandled exception occurs
    _ViewStart.cshtml <-- won't be used automatically in an area
    Web.config <-- put custom HTML Helper namespaces in here

This means if you are working with something like a WidgetController, you have to look in other folders to find the related WidgetViewModels, WidgetViews, WidgetEditorTemplates, WidgetDisplayTemplates, etc. As cumbersome as this may be, I stick to it and don't deviate from these MVC conventions. As far as putting a model, controller, and view in the same folder but with different namespaces, I avoid this because I use ReSharper. It will squiggly underline a namespace that doesn't match the folder where the class is located. I know I could turn this R# feature off, but it helps in other parts of the project.

For non-class files, MVC gives you Content and Scripts out of the box. We try to keep all of our static / non-compiled files in these places, again, to follow convention. Any time we incorporate a js library that uses themes (images and or css), the theme files all go somewhere under /content. For script, we just put all of them directly into /scripts. Originally this was to get JS intellisense from VS, but now that we get JS intellisense from R# regardless of placement in /scripts, I suppose we could deviate from that and divide scripts by folder to better organize. Are you using ReSharper? It is pure gold IMO.

Another little piece of gold that helps a lot with refactoring is T4MVC. Using this, we don't need to type in string paths for area names, controller names, action names, even files in content & scripts. T4MVC strongly types all of the magic strings for you. Here's a small sample of how your project structure doesn't matter as much if you're using T4MVC:

// no more magic strings in route definitions
context.MapRoutes(null,
    new[] { string.Empty, "features", "features/{version}" },
    new
    {
        area = MVC.PreviewArea.Name,
        controller = MVC.PreviewArea.Features.Name,
        action = MVC.PreviewArea.Features.ActionNames.ForPreview,
        version = "december-2011-preview-1",
    },
    new { httpMethod = new HttpMethodConstraint("GET") }
);

@* T4MVC renders .min.js script versions when project is targeted for release *@
<link href="@Url.Content(Links.content.Site_css)?r=201112B" rel="stylesheet" />
<script src="@Url.Content(Links.scripts.jquery_1_7_1_js)" type="text/javascript">
</script>

@* render a route URL as if you were calling an action method directly *@
<a href="@Url.Action(MVC.MyAreaName.MyControllerName.MyActionName
    (Model.SomeId))">@Html.DisplayFor(m => m.SomeText)</a>

// call action redirects as if you were executing an action method
return RedirectToAction(MVC.Area.MyController.DoSomething(obj1.Prop, null));

2. Data access: I have no experience with PetaPoco, but I'm sure it's worth checking out. For your complex reports, have you considered SQL Server Reporting services? Or, are you running on a different db? Sorry I'm not clear on what exactly you are asking for. We use EF + LINQ, but we also put certain knowledge about how to generate reports in domain classes. Thus, we have controller call domain service call repository, instead of having controller call repository directly. For ad-hoc reports we use SQL Reporting Services, which again isn't perfect, but our users like to be able to bring data into Excel easily, and SSRS makes that easy on us.

3. Client-Side Code Organization and UI Rendering: This is where I think I may be able to offer some help. Take a page from the book of MVC unobtrusive validation and unobtrusive AJAX. Consider this:

<img id="loading_spinner" src="/path/to/img" style="display:none;" />
<h2 id="loading_results" style="display:none;">
    Please wait, this may take a while...
</h2>
<div id="results">
</div>
<input id="doSomethingDangerous" class="u-std-ajax" 
    type="button" value="I'm feeling lucky" 
    data-myapp-confirm="Are you sure you want to do this?"
    data-myapp-show="loading_spinner,loading_results" 
    data-myapp-href="blah/DoDangerousThing" />

Ignore the ajax success function for now (more on this later). You can get away with a single script for some of your actions:

$('.u-std-ajax').click(function () {
    // maybe confirm something first
    var clicked = this;
    var confirmMessage = $(clicked).data('myapp-confirm');
    if (confirmMessage && !confirm(confirmMessage )) { return; } 

    // show a spinner?  something global would be preferred so 
    // I dont have to repeat this on every page 
    // maybe the page should notify the user of what's going on 
    // in addition to the dialog?
    var show = $(clicked).data('myapp-show');
    if (show) {
        var i, showIds = show.split(',');
        for (i = 0; i < showIds.length; i++) {
            $('#' + showIds[i]).show();
        }
    }

    var url = $(clicked).data('myapp-href');
    if (url) {
        $.ajax({
            url: url,
            complete: function () {                     
                // Need to hide the spinner, again would prefer to 
                // have this done elsewhere
                if (show) {
                    for (i = 0; i < showIds.length; i++) {
                        $('#' + showIds[i]).hide();
                    }
                }
            }
        });
    }
});

The above code will take care of the confirmation, showing the spinner, showing the wait message, and hiding the spinner / wait message after the ajax call is complete. You configure the behaviors using data-* attributes, like the unobtrusive libraries.

General Questions

- Client MVC vs. server MVC? I didn't try to librarify the actions you took in the success function because it looks like your controller is returning JSON. If your controllers are returning JSON, you might want to look at KnockoutJS. Knockout JS version 2.0 was released today. It can plug right into your JSON, so that an observable click can automatically bind data to your javascript templates. On the other hand if you don't mind having your ajax action methods return HTML instead of JSON, they can return the already-constructed UL with its LI children, and you can append that to an element by using data-myapp-response="results". Your success function would then just look like this:

success: function(html) {
    var responseId = $(clicked).data('myapp-response');
    if (responseId) {
        $('#' + responseId).empty().html(html);
    }
}

To sum up my best answer for this, if you must return JSON from your action methods, you are skipping the server-side View, so this really isn't server MVC -- it's just MC. If you return PartialViewResult with html to ajax calls, this is server MVC. So if your app must return JSON data for ajax calls, use client MVVM like KnockoutJS.

Either way, I don't like the JS you posted because it mixes your layout (html tags) with behavior (asynchronous data load). Choosing either server MVC with partial html views or client MVVM with pure JSON viewmodel data will solve this problem for you, but manually constructing DOM/HTML in javascript violates separation of concerns.

- Javascript file creation Apparently minification features are coming in .NET 4.5. If you go the unobtrusive route, there shouldn't be anything stopping you from loading all of your JS in 1 script file. I would be careful about creating different JS files for each entity type, you will end up with JS file explosion. Remember, once your script file is loaded, the browser should cache it for future requests.

- Complex queries I don't consider having feature like pagination, sorting, etc, as being complex. My preference is to handle this with URL's and server-side logic, to make the db queries as limited as needed. However we are deployed to Azure, so query optimization is important to us. For example: /widgets/show-{pageSize}-per-page/page-{pageNumber}/sort-by-{sortColumn}-{sortDirection}/{keyword}. EF and LINQ to Entities can handle pagination and sorting with methods like .Take(), .Skip(), .OrderBy(), and .OrderByDescending(), so you get what you need during the db trip. I haven't found the need for a clientlib yet, so I honestly don't know much about them. Look to other answers for more advice on that.

- Project silk Never heard of this one, will have to check it out. I am a big fan of Steve Sanderson, his books, his BeginCollectionItem HtmlHelper, and his blog. That said, I don't have any experience with KnockoutJS in production. I have checked out its tutorials, but I try not to commit to something until it's at least version 2.0. Like I mentioned, KnockoutJS 2.0 was just released.

- N-tier If by tier you mean different physical machine, then no, I don't think anything goes out any windows. Generally 3-tier means you have 3 machines. So you might have a fat client as your presentation tier, that runs on a user's machine. The fat client might access a service tier, that runs on an application server and returns XML or whatever to the fat client. And the service tier might get its data from a SQL server on a 3rd machine.

MVC is one layer, on 1 tier. Your controllers, models, and views are all part of your Presentation Layer, which is 1 tier in the physical architecture. MVC implements the Model-View-Controller pattern, which is where you might be seeing additional layers. However, try not to think of these 3 aspects as tiers or layers. Try to think of all 3 of them as Presentation Layer Concerns.

Update after pres/bus/data comment

Okay, so you are using tier and layer interchangably. I usually use the term "layer" for logical / project / assembly divisions, and tier for physical network separation. Sorry for the confusion.

You will find quite a few people in the MVC camp who say you should not use the "Models" in MVC for your entity data model, nor should you use your Controllers for business logic. Ideally your models should be view-specific ViewModels. Using something like Automapper, you take your entities from your domain model and DTO them into ViewModels, sculpted specifically for use by the view.

Any business rules should also be part of your domain, and you can implement them using domain services / factory pattern / whatever is appropriate in your domain layer, not in the MVC presentation layer. Controllers should be dumb, though not quite as dumb as models, and should give responsibility to the domain for anything that requires business knowledge. Controllers manage the flow of HTTP requests and responses, but anything with real business value should be above the controller's pay grade.

So, you can still have a layered architecture, with MVC as the presentation layer. It is a client of your application layer, service layer, or domain layer, depending on how you architect it. But ultimately your entity model should be part of the domain, not models in MVC.

Best Answer

Related Solutions

C# – the point of properties

JavaScript Modularity – Server-Based MVC and Business Reality

Related Topic