C++ Design – Iterator Awareness of Its Own End

cdesigniterator

For some background of why I am asking this question here is an example. In python the method chain chains an arbitrary number of ranges together and makes them into one without making copies. Here is a link in case you don't understand it. I decided I would implement chain in c++ using variadic templates. As far as I can tell the only way to make an iterator for chain that will successfully go to the next container is for each iterator to to know about the end of the container (I thought of a sort of hack in where when != is called against the end it will know to go to the next container, but the first way seemed easier and safer and more versatile).

My question is if there is anything inherently wrong with an iterator knowing about its own end, my code is in c++ but this can be language agnostic since many languages have iterators.

#ifndef CHAIN_HPP
#define CHAIN_HPP

#include "iterator_range.hpp"

namespace iter {
   template <typename ... Containers>
       struct chain_iter;
   template <typename Container>
       struct chain_iter<Container> {

        private:
           using Iterator = decltype(((Container*)nullptr)->begin());
           Iterator begin;
           const Iterator end;//never really used but kept it for consistency

        public:
           chain_iter(Container & container, bool is_end=false) :
               begin(container.begin()),end(container.end()) {
                   if(is_end) begin = container.end();
           }
           chain_iter & operator++()
           {
               ++begin;
               return *this;
           }
           auto operator*()->decltype(*begin)
           {
               return *begin;
           }
           bool operator!=(const chain_iter & rhs) const{
               return this->begin != rhs.begin;
           }
       };
   template <typename Container, typename ... Containers>
       struct chain_iter<Container,Containers...>
       {

        private:
           using Iterator = decltype(((Container*)nullptr)->begin());
           Iterator begin;
           const Iterator end;
           bool end_reached = false;
           chain_iter<Containers...> next_iter;

        public:
           chain_iter(Container & container, Containers& ... rest, bool is_end=false) :
               begin(container.begin()),
               end(container.end()),
               next_iter(rest...,is_end) {
                   if(is_end)
                       begin = container.end();
               }
           chain_iter & operator++()
           {
               if (begin == end) {
                   ++next_iter;
               }
               else {
                   ++begin;
               }
               return *this;               
           }
           auto operator*()->decltype(*begin)
           {
               if (begin == end) {
                   return *next_iter;
               }
               else {
                   return *begin;
               }
           }   
           bool operator !=(const chain_iter & rhs) const {
               if (begin == end) {
                   return this->next_iter != rhs.next_iter;
               }
               else
                   return this->begin != rhs.begin;
           }
        };
   template <typename ... Containers>
       iterator_range<chain_iter<Containers...>> chain(Containers& ... containers)
       {
           auto begin = 
               chain_iter<Containers...>(containers...);
           auto end =
               chain_iter<Containers...>(containers...,true);
           return 
               iterator_range<chain_iter<Containers...>>(begin,end);
       }
}

#endif //CHAIN_HPP

Best Answer

Been there, got burned. Creating things that look like iterator but have different or extra requirements will lead to a mess. Basically many ranges are not copyable or at least not cheaply so, but that's what one normally expects of an iterator (it is a requirement of the iterator concept).

You should not have iterators that know of their own end. But chaining works with ranges. There are two ways to define ranges:

  • As forward-iterable "containers", which you can make of simple pair of iterators. This is a C++ way (and Boost.Range1 has some useful utilities for these), but sometimes it is quite a bit of extra work to make various objects that provide sequences fit the interface.

  • Define your interface for "generators". It will probably be similar to the python one, but since exceptions are less convenient in C++ than python, it will probably have different method of detecting end. I settled for following interface for my own needs2:

    template <typename T> concept Generator {
        bool valid();
        void next();
        T get();
    };
    

    where the iteration looks like:

    while(g.valid()) {
        auto item = g.get();
        do_anything_with(item);
        g.next();
    }
    

    the generator conceptually starts on first item in the sequence, but may only be accessed after valid is called. I found this allows distributing the hard work between constructor, valid and next as is fit for each case and it can be easily wrapped in iterator similarly to how istream_iterator is done. Other variations of the interface are possible including following the istream one (but it has disadvantage that it returns default element when the iteration fails).

Basically you should probably combine the approaches. If you use the later concept, you can adapt any such implementation to fit the (quite complex) Range concept from Boost.Range e.g. using "mixin" and Curiously Recurring Template Pattern. Something like:

template <typename GeneratorT, typename ValueT>
class GeneratorIterator :
    boost::iterator_facade<GeneratorT, ValueT, boost::single_pass_traversal_tag> {
    GeneratorT *g;
    GeneratorIterator() : g() {}
    GeneratorIterator(GeneratorT *g) g(g) {}
    ValueT &dereference() {
        if(!g || !g.valid())
            throw std::runtime_error("...");
        return g->get();
    }
    bool equal(GeneratorIterator const &l) {
        return g == l.g || ((!g || !g.valid()) && (!l.g || !l.g.valid()));
    }
    void increment() {
        if(g)
            g.next();
    }
}

template <typename GeneratorT, typename ValueT>
class GeneratorFacade {
  public:
    typedef GeneratorIterator<GeneratorT, ValueT> iterator;
    typedef GeneratorIterator<GeneratorT, ValueT> const_iterator;
    const_iterator begin() const {
        return const_iterator(this);
    }
    const_iterator end() const {
        return const_iterator();
    }
}

The advantage of the indirection is that the ranges now don't have to be copyable at all or not cheaply while the iterator is just a pointer and therefore is cheaply copyable as required. And defining generators is simple and easy to understand while they still end up conforming to the hairy standard C++ interface.

(Disclaimer: I wrote it off top of my head, not tested)


1 Boost.Range includes concatenating ranges. Don't reinvent the wheel and reuse or at least inspire yourself.

2 The Iterators Must Go talk linked in Ylisar's answer comes up with the same interface, just different names. Note that many languages combine the next/popFront and valid/empty to one next that returns a boolean, but that approach is much more difficult to wrap in iterators and conceptually somewhat more complex, because then the iterators start out in special "uninitialized" state.

Related Topic