C# – Detecting IEnumerable State Machines

api-designc

I just read an interesting article called Getting too cute with c# yield return

It made me wonder what the best way is to detect whether an IEnumerable is an actual enumerable collection, or if it's a state machine generated with the yield keyword.

For example, you could modify DoubleXValue (from the article) to something like:

private void DoubleXValue(IEnumerable<Point> points)
{
    if(points is List<Point>)
      foreach (var point in points)
        point.X *= 2;
    else throw YouCantDoThatException();
}

Question 1) Is there a better way to do this?

Question 2) Is this something I should worry about when creating an API?

Best Answer

Your question, as I understand it, seems to be based on an incorrect premise. Let me see if I can reconstruct the reasoning:

  • The linked-to article describes how automatically-generated sequences exhibit a "lazy" behaviour, and shows how this can lead to a counter-intuitive result.
  • Therefore I can detect whether a given instance of IEnumerable is going to exhibit this lazy behaviour by checking to see if it is automatically generated.
  • How do I do that?

The problem is that the second premise is false. Even if you could detect whether or not a given IEnumerable was the result of an iterator block transformation (and yes, there are ways to do that) it wouldn't help because the assumption is wrong. Let's illustrate why.

class M { public int P { get; set; } }
class C
{
  public static IEnumerable<M> S1()
  {
    for (int i = 0; i < 3; ++i) 
      yield return new M { P = i };
  }

  private static M[] ems = new M[] 
  { new M { P = 0 }, new M { P = 1 }, new M { P = 2 } };
  public static IEnumerable<M> S2()
  {
    for (int i = 0; i < 3; ++i)
      yield return ems[i];
  }

  public static IEnumerable<M> S3()
  {
    return new M[] 
    { new M { P = 0 }, new M { P = 1 }, new M { P = 2 } };
  }

  private class X : IEnumerable<M>
  {
    public IEnumerator<X> GetEnumerator()
    {
      return new XEnum();
    }
    // Omitted: non generic version
    private class XEnum : IEnumerator<X>
    {
      int i = 0;
      M current;
      public bool MoveNext()
      {
        current = new M() { P = i; }
        i += 1;
        return true;
      }
      public M Current { get { return current; } }
      // Omitted: other stuff.
    }
  }

  public static IEnumerable<M> S4()
  {
    return new X();
  }

  public static void Add100(IEnumerable<M> items)
  {
    foreach(M item in items) item.P += 100;
  }
}

All right, we have four methods. S1 and S2 are automatically generated sequences; S3 and S4 are manually generated sequences. Now suppose we have:

var items = C.Sn(); // S1, S2, S3, S4
S.Add100(items);
Console.WriteLine(items.First().P);

The result for S1 and S4 will be 0; every time you enumerate the sequence, you get a fresh reference to an M created. The result for S2 and S3 will be 100; every time you enumerate the sequence, you get the same reference to M you got the last time. Whether the sequence code is automatically generated or not is orthogonal to the question of whether the objects enumerated have referential identity or not. Those two properties -- automatic generation and referential identity -- actually have nothing to do with each other. The article you linked to conflates them somewhat.

Unless a sequence provider is documented as always proffering up objects that have referential identity, it is unwise to assume that it does so.