C# – for vs. foreach vs. LINQ

clinq

When I write code in Visual Studio, ReSharper (God bless it!) often suggests me to change my old-school for loop in the more compact foreach form.

And often, when I accept this change, ReSharper goes a step forward, and suggests me to change it again, in a shiny LINQ form.

So, I wonder: are there some real advantages, in these improvements? In pretty simple code execution, I cannot see any speed boost (obviously), but I can see the code becoming less and less readable… So I wonder: is it worth it?

Best Answer

`for` vs. `foreach`

There is a common confusion that those two constructs are very similar and that both are interchangeable like this:

foreach (var c in collection)
{
    DoSomething(c);
}

and:

for (var i = 0; i < collection.Count; i++)
{
    DoSomething(collection[i]);
}

The fact that both keywords start by the same three letters doesn't mean that semantically, they are similar. This confusion is extremely error-prone, especially for beginners. Iterating through a collection and doing something with the elements is done with foreach; for doesn't have to and shouldn't be used for this purpose, unless you really know what you're doing.

Let's see what's wrong with it with an example. At the end, you'll find the full code of a demo application used to gather the results.

In the example, we are loading some data from the database, more precisely the cities from Adventure Works, ordered by name, before encountering "Boston". The following SQL query is used:

select distinct [City] from [Person].[Address] order by [City]

The data is loaded by ListCities() method which returns an IEnumerable<string>. Here is what foreach looks like:

foreach (var city in Program.ListCities())
{
    Console.Write(city + " ");

    if (city == "Boston")
    {
        break;
    }
}

Let's rewrite it with a for, assuming that both are interchangeable:

var cities = Program.ListCities();
for (var i = 0; i < cities.Count(); i++)
{
    var city = cities.ElementAt(i);

    Console.Write(city + " ");

    if (city == "Boston")
    {
        break;
    }
}

Both return the same cities, but there is a huge difference.

When using foreach, ListCities() is called one time and yields 47 items.
When using for, ListCities() is called 94 times and yields 28153 items overall.

What happened?

IEnumerable is lazy. It means that it will do the work only at the moment when the result is needed. Lazy evaluation is a very useful concept, but has some caveats, including the fact that it's easy to miss the moment(s) where the result will be needed, especially in the cases where the result is used multiple times.

In a case of a foreach, the result is requested only once. In a case of a for as implemented in the incorrectly written code above, the result is requested 94 times, i.e. 47 × 2:

Every time cities.Count() is called (47 times),
Every time cities.ElementAt(i) is called (47 times).

Querying a database 94 times instead of one is terrible, but not the worse thing which may happen. Imagine, for example, what would happen if the select query would be preceded by a query which also inserts a row in the table. Right, we would have for which will call the database 2,147,483,647 times, unless it hopefully crashes before.

Of course, my code is biased. I deliberately used the laziness of IEnumerable and wrote it in a way to repeatedly call ListCities(). One can note that a beginner will never do that, because:

The IEnumerable<T> doesn't have the property Count, but only the method Count(). Calling a method is scary, and one can expect its result to not be cached, and not suitable in a for (; ...; ) block.
The indexing is unavailable for IEnumerable<T> and it's not obvious to find the ElementAt LINQ extension method.

Probably most beginners would just convert the result of ListCities() to something they are familiar with, like a List<T>.

var cities = Program.ListCities();
var flushedCities = cities.ToList();
for (var i = 0; i < flushedCities.Count; i++)
{
    var city = flushedCities[i];

    Console.Write(city + " ");

    if (city == "Boston")
    {
        break;
    }
}

Still, this code is very different from the foreach alternative. Again, it gives the same results, and this time the ListCities() method is called only once, but yields 575 items, while with foreach, it yielded only 47 items.

The difference comes from the fact that ToList() causes all data to be loaded from the database. While foreach requested only the cities before "Boston", the new for requires all cities to be retrieved and stored in memory. With 575 short strings, it probably doesn't make much difference, but what if we were retrieving only few rows from a table containing billions of records?

So what is `foreach`, really?

foreach is closer to a while loop. The code I previously used:

foreach (var city in Program.ListCities())
{
    Console.Write(city + " ");

    if (city == "Boston")
    {
        break;
    }
}

can be simply replaced by:

using (var enumerator = Program.ListCities().GetEnumerator())
{
    while (enumerator.MoveNext())
    {
        var city = enumerator.Current;
        Console.Write(city + " ");

        if (city == "Boston")
        {
            break;
        }
    }
}

Both produce the same IL. Both have the same result. Both have the same side effects. Of course, this while can be rewritten in a similar infinite for, but it would be even longer and error-prone. You're free to choose the one you find more readable.

Want to test it yourself? Here's the full code:

using System;
using System.Collections.Generic;
using System.Data;
using System.Data.SqlClient;
using System.Diagnostics;
using System.Linq;

public class Program
{
    private static int countCalls;

    private static int countYieldReturns;

    public static void Main()
    {
        Program.DisplayStatistics("for", Program.UseFor);
        Program.DisplayStatistics("for with list", Program.UseForWithList);
        Program.DisplayStatistics("while", Program.UseWhile);
        Program.DisplayStatistics("foreach", Program.UseForEach);

        Console.WriteLine("Press any key to continue...");
        Console.ReadKey(true);
    }

    private static void DisplayStatistics(string name, Action action)
    {
        Console.WriteLine("--- " + name + " ---");

        Program.countCalls = 0;
        Program.countYieldReturns = 0;

        var measureTime = Stopwatch.StartNew();
        action();
        measureTime.Stop();

        Console.WriteLine();
        Console.WriteLine();
        Console.WriteLine("The data was called {0} time(s) and yielded {1} item(s) in {2} ms.", Program.countCalls, Program.countYieldReturns, measureTime.ElapsedMilliseconds);
        Console.WriteLine();
    }

    private static void UseFor()
    {
        var cities = Program.ListCities();
        for (var i = 0; i < cities.Count(); i++)
        {
            var city = cities.ElementAt(i);

            Console.Write(city + " ");

            if (city == "Boston")
            {
                break;
            }
        }
    }

    private static void UseForWithList()
    {
        var cities = Program.ListCities();
        var flushedCities = cities.ToList();
        for (var i = 0; i < flushedCities.Count; i++)
        {
            var city = flushedCities[i];

            Console.Write(city + " ");

            if (city == "Boston")
            {
                break;
            }
        }
    }

    private static void UseForEach()
    {
        foreach (var city in Program.ListCities())
        {
            Console.Write(city + " ");

            if (city == "Boston")
            {
                break;
            }
        }
    }

    private static void UseWhile()
    {
        using (var enumerator = Program.ListCities().GetEnumerator())
        {
            while (enumerator.MoveNext())
            {
                var city = enumerator.Current;
                Console.Write(city + " ");

                if (city == "Boston")
                {
                    break;
                }
            }
        }
    }

    private static IEnumerable<string> ListCities()
    {
        Program.countCalls++;
        using (var connection = new SqlConnection("Data Source=mframe;Initial Catalog=AdventureWorks;Integrated Security=True"))
        {
            connection.Open();

            using (var command = new SqlCommand("select distinct [City] from [Person].[Address] order by [City]", connection))
            {
                using (var reader = command.ExecuteReader(CommandBehavior.SingleResult))
                {
                    while (reader.Read())
                    {
                        Program.countYieldReturns++;
                        yield return reader["City"].ToString();
                    }
                }
            }
        }
    }
}

And the results:

--- for ---
Abingdon Albany Alexandria Alhambra [...] Bonn Bordeaux Boston

The data was called 94 time(s) and yielded 28153 item(s).

--- for with list ---
Abingdon Albany Alexandria Alhambra [...] Bonn Bordeaux Boston

The data was called 1 time(s) and yielded 575 item(s).

--- while ---
Abingdon Albany Alexandria Alhambra [...] Bonn Bordeaux Boston

The data was called 1 time(s) and yielded 47 item(s).

--- foreach ---
Abingdon Albany Alexandria Alhambra [...] Bonn Bordeaux Boston

The data was called 1 time(s) and yielded 47 item(s).

LINQ vs. traditional way

As for LINQ, you may want to learn functional programming (FP) - not C# FP stuff, but real FP language like Haskell. Functional languages have a specific way to express and present the code. In some situations, it is superior to non-functional paradigms.

FP is known being much superior when it comes to manipulating lists (list as a generic term, unrelated to List<T>). Given this fact, the ability to express C# code in a more functional way when it comes to lists is rather a good thing.

If you're not convinced, compare the readability of code written in both functional and non-functional ways in my previous answer on the subject.

Related Solutions

C# – What problem domain is LINQ made for

LINQ is primarily designed to allow pure functional queries and transformations on sequences of data (you will notice that all the LINQ extensions take Func delegates but not Action delegates). Consequently the most common case of a loop that does not fit with LINQ very well is one that is all about non-pure functional side effects, e.g.

foreach(var x in list) Console.WriteLine(x);

To get better at using LINQ, just practice using it.

Every time you are about to write a for or foreach loop to do something with a collection, stop, consider if it's a good fit for LINQ (i.e. it's not just performing an action/side effect on the elements), and if so force yourself to write it using LINQ.

You could also write the foreach version first then rewrite to a LINQ version.

As svick points out, LINQ should be about making your program more readable. It is usually good at this as it tends to emphasize the intent of the code rather than the mechanism; however if you find you cannot make your queries more readable than a simple loop, feel free to stick with the loop.

If you need exercises to practice, most functional programming exercises will map nicely to LINQ e.g. 99 problems (especially the first 20 or so) or project euler.

LINQ Efficiency – Is It More Efficient Than It Appears?

LINQ queries are lazy. That means the code:

var things = mythings
    .Where(x => x.IsSomeValue)
    .Where(y => y.IsSomeOtherValue);

does very little. The original enumerable (mythings) is only enumerated when the resulting enumerable (things) is consumed, e.g. by a foreach loop, .ToList(), or .ToArray().

If you call things.ToList(), it is roughly equivalent to your latter code, with perhaps some (usually insignificant) overhead from the enumerators.

Likewise, if you use a foreach loop:

foreach (var t in things)
    DoSomething(t);

It is similar in performance to:

foreach (var t in mythings)
    if (t.IsSomeValue && t.IsSomeOtherValue)
        DoSomething(t);

Some of the performance advantages of the laziness approach for enumerables (as opposed to calculating all the results and storing them in a list) are that it uses very little memory (since only one result is stored at a time) and that there's no significant up-front cost.

If the enumerable is only partially enumerated, this is especially important. Consider this code:

things.First();

The way LINQ is implemented, mythings will only be enumerated up to the first element that matches your where conditions. If that element is early on in the list, this can be a huge performance boost (e.g. O(1) instead of O(n)).