LINQ Efficiency – Is It More Efficient Than It Appears?

clinq

If I write something like this:

var things = mythings
    .Where(x => x.IsSomeValue)
    .Where(y => y.IsSomeOtherValue)

Is this the same as:

var results1 = new List<Thing>();
foreach(var t in mythings)
    if(t.IsSomeValue)
        results1.Add(t);

var results2 = new List<Thing>();
foreach(var t in results1)
    if(t.IsSomeOtherValue)
        results2.Add(t);

Or is there some magic under the covers that works more like this:

var results = new List<Thing>();
foreach(var t in mythings)
    if(t.IsSomeValue && t.IsSomeOtherValue)
        results.Add(t);

Or is it something completely different altogether?

Best Answer

LINQ queries are lazy. That means the code:

var things = mythings
    .Where(x => x.IsSomeValue)
    .Where(y => y.IsSomeOtherValue);

does very little. The original enumerable (mythings) is only enumerated when the resulting enumerable (things) is consumed, e.g. by a foreach loop, .ToList(), or .ToArray().

If you call things.ToList(), it is roughly equivalent to your latter code, with perhaps some (usually insignificant) overhead from the enumerators.

Likewise, if you use a foreach loop:

foreach (var t in things)
    DoSomething(t);

It is similar in performance to:

foreach (var t in mythings)
    if (t.IsSomeValue && t.IsSomeOtherValue)
        DoSomething(t);

Some of the performance advantages of the laziness approach for enumerables (as opposed to calculating all the results and storing them in a list) are that it uses very little memory (since only one result is stored at a time) and that there's no significant up-front cost.

If the enumerable is only partially enumerated, this is especially important. Consider this code:

things.First();

The way LINQ is implemented, mythings will only be enumerated up to the first element that matches your where conditions. If that element is early on in the list, this can be a huge performance boost (e.g. O(1) instead of O(n)).

Related Solutions

LINQ Performance – Does LINQ Require More Processing Cycles and Memory?

I'd say the main weakness of this answer is less its use of Linq and more the specific operators chosen. GroupBy takes each element and projects it to a key and a value which go into a lookup. In other words, every word will add something to the lookup.

The naive implementation .GroupBy(e => e) will store a copy of every word in the source material, making the final lookup nearly as large as the original source material. Even if we project out the value with .GroupBy(e => e, e => null) we're creating a large lookup of small values.

What we would want is an operator that preserves only the needed information, which is one copy of each word and the count of that word so far. For that, we can use Aggregate:

words.Aggregate(new Dictionary<string, int>(), (counts, word) => 
{
    int currentCount;
    counts.TryGetValue(word, currentCount);
    counts[word] = currentCount + 1;
    return counts;
}

From here, there are several ways we could attempt to make this faster:

Instead of creating many strings while splitting, we could pass around structs that reference the original string and the segment that contains the word, and only copy the segment out when it turns out to be a unique key
Use Parallel Linq to aggregate across several cores then combine the results. This is trivial compared to the leg work required for doing this by hand.

C# – Best way to remove list items from an existing record

Just add kind of state property to your records.

Your record can potentially have at least 3 states

Added
Deleted
Changed

When your server/base/whatever.. recives the raw data-table from the UI, it just iterates over the collection and based on state of every record in that data-table executes appropriate operation.

Hope this helps.

Best Answer

Related Solutions

LINQ Performance – Does LINQ Require More Processing Cycles and Memory?

C# – Best way to remove list items from an existing record

Related Topic