Iterators – Advantages of Next-Iterator Over This-Iterator

citeratorjava

I don't work too often with Java/C# iterators directly but when I do I always wonder what was the reason to design iterators in "next" fashion.

In order to start you have to move iterator, in order to check if there is some data you have to check if there is next element.

More appealing concept for me is this-iterator — it "starts" from the beginning, and you can check this state, let's say isValid. So the loop over entire collection would look like this:

while (iter.isValid())
{
  println(iter.current());
  iter.next();
}

As you here in order to check if there is next element, you go there, and then you check if the state is valid.

YMMV but anyway — is there any advantage of next-iterator (as in Java/C#) over this-iterator?

Note: this is conceptual question, about the design of the language.

Best Answer

I think an advantage of C#/.NET's MoveNext() model over Java's hasNext() model is that the former implies that some work may be done. The hasNext() method implies a simple state check. But what if you are iterating a lazy stream, and you have to go out to a database to determine whether there are any results? In that case, the first call to hasNext() may be a long blocking operation. The naming of the hasNext() method can be very misleading as to what's actually going on.

For a real world example, this design issue bit me when implementing Microsoft's Reactive Extensions (Rx) APIs in Java. Specifically, for the iterator created by IObservable.asIterable() to function properly, the hasNext() method would have to block and wait for the next item/error/completion notification to arrive. That's hardly intuitive: the name implies a simple state check. Contrast that to the C# design, where MoveNext() would have to block, which is not an altogether unexpected result when you are dealing with a lazily evaluated stream.

My point is this: the "next-iterator" model is preferable to the "this-iterator" model because the "this-iterator" model is often a lie: you may be required to pre-fetch the next element in order to check the state of the iterator. A design which communicates that possibility clearly is preferable, in my opinion. Yes, Java's naming conventions do follow the "next" model, but the behavioral implications are similar to those of the "this-iterator" model.

Clarification: Perhaps contrasting Java and C# was a poor choice, because Java's model is deceptive. I consider the most important distinction in the models presented by the OP to be that the "this-iterator" model decouples the "is valid" logic from the "retrieve current/next element" logic, while an ideal "next-iterator" implementation combines these operations. I feel it is appropriate to combine them because, in some cases, determining whether the iterator's state is valid requires prefetching the next element, so that possibility should be made as explicit as possible.

I do not see a significant design difference between:

while (i.isValid()) { // do we have an element?  (implied as fast, non-blocking)
    doSomething(i.current()); // retrieve the element (may be slow!)
    i.next();
}
// ...and:
while (i.hasNext()) { // do we have an element?  (implied as fast, non-blocking)
    doSomething(i.next()); // retrieve the element (may be slow!)
}

But I do see a meaningful difference here:

while (i.hasNext()) { // do we have an element?  (implied as fast, non-blocking)
    doSomething(i.next()); // retrieve the element (may be slow!)
}
// ...and:
while (i.moveNext()) { // fetch the next element if it exists (may be slow!)
    doSomething(i.current()); // get the element  (implied as fast, non-blocking)
}

In both the isValid() and hasNext() examples, the operation that is implied to be a fast and non-blocking state check may in fact be a slow, blocking operation. In the moveNext() example, most of the work is being done in the method you'd expect, regardless of whether you are dealing with an eagerly or lazily evaluated stream.

Related Solutions

Java ArrayList – Maximum Index Value Explained

ArrayList in Java has a get(int index) method. int is a signed 32 bit value, with a maximum value of 2,147,483,647. That is the largest possible value that can be accessed in an ArrayList. Period. The specifics of what the maximum size of the array or ArrayList differe based upon the implementation of the JVM (which may be lower than the MAX_INT value). You can't make an ArrayList (or for that matter an int[] array) that has a long for its index.

If you were to try to instantiate an array list of this magnitude, you would have a structure that is at least 8 gigabytes - this only accounts for MAX_INT pointers, and not the additional space of the data at each point.

Attempting to access beyond the maximum value allowed through an iterator associated with the array would likely result in one of OutOfMemoryException, IndexOutOfBoundsException or a NoSuchElementException depending on implementation.

This is very impractical use of memory. If one was to want such a data structure, one should investigate less RAM intensive approaches such as databases, sparse arrays, and the like.

C++ Design – Iterator Awareness of Its Own End

Been there, got burned. Creating things that look like iterator but have different or extra requirements will lead to a mess. Basically many ranges are not copyable or at least not cheaply so, but that's what one normally expects of an iterator (it is a requirement of the iterator concept).

You should not have iterators that know of their own end. But chaining works with ranges. There are two ways to define ranges:

As forward-iterable "containers", which you can make of simple pair of iterators. This is a C++ way (and Boost.Range¹ has some useful utilities for these), but sometimes it is quite a bit of extra work to make various objects that provide sequences fit the interface.
Define your interface for "generators". It will probably be similar to the python one, but since exceptions are less convenient in C++ than python, it will probably have different method of detecting end. I settled for following interface for my own needs²:
```
template <typename T> concept Generator {
    bool valid();
    void next();
    T get();
};
```
where the iteration looks like:
```
while(g.valid()) {
    auto item = g.get();
    do_anything_with(item);
    g.next();
}
```
the generator conceptually starts on first item in the sequence, but may only be accessed after valid is called. I found this allows distributing the hard work between constructor, valid and next as is fit for each case and it can be easily wrapped in iterator similarly to how istream_iterator is done. Other variations of the interface are possible including following the istream one (but it has disadvantage that it returns default element when the iteration fails).

Basically you should probably combine the approaches. If you use the later concept, you can adapt any such implementation to fit the (quite complex) Range concept from Boost.Range e.g. using "mixin" and Curiously Recurring Template Pattern. Something like:

template <typename GeneratorT, typename ValueT>
class GeneratorIterator :
    boost::iterator_facade<GeneratorT, ValueT, boost::single_pass_traversal_tag> {
    GeneratorT *g;
    GeneratorIterator() : g() {}
    GeneratorIterator(GeneratorT *g) g(g) {}
    ValueT &dereference() {
        if(!g || !g.valid())
            throw std::runtime_error("...");
        return g->get();
    }
    bool equal(GeneratorIterator const &l) {
        return g == l.g || ((!g || !g.valid()) && (!l.g || !l.g.valid()));
    }
    void increment() {
        if(g)
            g.next();
    }
}

template <typename GeneratorT, typename ValueT>
class GeneratorFacade {
  public:
    typedef GeneratorIterator<GeneratorT, ValueT> iterator;
    typedef GeneratorIterator<GeneratorT, ValueT> const_iterator;
    const_iterator begin() const {
        return const_iterator(this);
    }
    const_iterator end() const {
        return const_iterator();
    }
}

The advantage of the indirection is that the ranges now don't have to be copyable at all or not cheaply while the iterator is just a pointer and therefore is cheaply copyable as required. And defining generators is simple and easy to understand while they still end up conforming to the hairy standard C++ interface.

(Disclaimer: I wrote it off top of my head, not tested)

¹ Boost.Range includes concatenating ranges. Don't reinvent the wheel and reuse or at least inspire yourself.

² The Iterators Must Go talk linked in Ylisar's answer comes up with the same interface, just different names. Note that many languages combine the next/popFront and valid/empty to one next that returns a boolean, but that approach is much more difficult to wrap in iterators and conceptually somewhat more complex, because then the iterators start out in special "uninitialized" state.

Best Answer

Related Solutions

Java ArrayList – Maximum Index Value Explained

C++ Design – Iterator Awareness of Its Own End

Related Topic