C# IEnumerable, List, and Arrays – Understanding Unexpected Modifications

c

I have started learning c# and am confused by the following feature.

The following code uses a function Algs.Combinations(n,m) to produce an IEnumerable which contains the combinations of n objects taken from m. Its implementation is exactly as can be found on the Rosetta code page for combinations. For example, applying .toList() to Algs.Combinations(2,3) would produce the list [[0,1],[0,2],[1,2]].

Now in the foreach loop I add each int[] to a new list. I also print out the values of each array(ie the combination) after turning it to a list and then using the built in ForEach. Everything works as expected.

In the next block I use the built in ForEach on my new list and it seems every int[] has become [2,2]. What is going on here? Presumably in my foreach I am not actually adding the integer array but some kind of pointer?

using System;
using System.Collections.Generic;
using System.Linq;


namespace Examples
{
    class main
    {
        public static IEnumerable<int[]> Combinations(int m, int n)
        {
            int[] result = new int[m];
            Stack<int> stack = new Stack<int>();
            stack.Push(0);

            while (stack.Count > 0)
            {
                int index = stack.Count - 1;
                int value = stack.Pop();

                while (value < n)
                {
                    result[index++] = value++;
                    stack.Push(value);
                    if (index == m)
                    {
                        yield return result;
                        break;
                    }
                }
            }
        }

        static void Main(String[] args)
        {
            var l = Combinations(2, 3);
            var newL = new List<int[]>();
            foreach (int[] combs in l)
            {
                newL.Add(combs);
                combs.ToList().ForEach(p => Console.Write(p + " ")); // prints 0 1, 0 2, 1 2 
                Console.WriteLine();
            }
            newL.ForEach(p =>
            {
                p.ToList().ForEach(q => Console.Write(q + " ")); // prints 2 2, 2 2, 2 2
                Console.WriteLine();
            });
         } 

    }
 }

Googling hasn't helped so far. Any links to literature I should look at will be appreciated.

Best Answer

Your problem is that you are only returning a single array from your Combinations function, but returning it multiple times with the values changed inside the array between each iteration.

Methods which use yield return are coroutines - in each foreach iteration they execute up to the next yield statement they find and then they exit, picking up where they left off in the next iteration. In your first foreach loop you are taking the enumerated int[] value and putting it into your newL variable - but it's always the same int[] value because Combinations only ever creates one array. The print out in the first foreach works fine because the coroutine is paused waiting for the next iteration to start.

You have a couple of options. The best option is to make your Combinations have a sane implementation, in that it returns a new object for each yield return statement. That might look something like this:

    public static IEnumerable<int[]> Combinations(int m, int n)
    {
        while (weStillHaveMoreResultsToBuild)
        {
            int[] result = new int[m];

            while (value < n)
            {
                // fill result and break out of loop when done
            }

            yield return result;
        }
    }

Another option would be to not use yield return at all, and instead build a list inside Combinations and return that. This approach could take a lot more memory if your n and m values are large.

A third option is to make a copy of the objects returned by Combinations and store those instead:

newL.Add(combs.ToArray()); // Calling ToArray() will make a copy

This is my least favorite option, because it requires any caller of Combinations to know that the same object will be returned multiple times but changed in each iteration.