C++ Algorithms – Why Do All Functions Take Only Ranges, Not Containers?

clanguage-designstandard-library

There are many useful functions in <algorithm>, but all of them operate on "sequences" – pairs of iterators. E.g., if I have a container and like to run std::accumulate on it, I need to write:

std::vector<int> myContainer = ...;
int sum = std::accumulate(myContainer.begin(), myContainer.end(), 0);

When all I intend to do is:

int sum = std::accumulate(myContainer, 0);

Which is a bit more readable and clearer, in my eyes.

Now I can see that there might be cases where you'd only want to operate on parts of a container, so it's definitely useful to have the option of passing ranges. But at least in my experience, that's a rare special case. I'll usually want to operate on whole containers.

It's easy to write a wrapper function which takes a container and calls begin() and end() on it, but such convenience functions are not included in the standard library.

I'd like to know the reasoning behind this STL design choice.

Best Answer

... it's definitely useful to have the option of passing ranges. But at least in my experience, that's a rare special case. I'll usually want to operate on whole containers

It may be a rare special case in your experience, but in reality the whole container is the special case, and the arbitrary range is the general case.

You've already noticed that you can implement the whole container case using the current interface, but you can't do the converse.

So, the library-writer had a choice between implementing two interfaces up front, or only implementing one which still covers all cases.

It's easy to write a wrapper function which takes a container and calls begin() and end() on it, but such convenience functions are not included in the standard library

True, especially since the free functions std::begin and std::end are now included.

So, let's say the library provides the convenience overload:

template <typename Container>
void sort(Container &c) {
  sort(begin(c), end(c));
}

now it also needs to provide the equivalent overload taking a comparison functor, and we need to provide the equivalents for every other algorithm.

But we at least covered every case where we want to operate on a full container, right? Well, not quite. Consider

std::for_each(c.rbegin(), c.rend(), foo);

If we want to handle operating backwards on containers, we need another method (or pair of methods) per existing algorithm.

So, the range-based approach is more general in the simple sense that:

it can do everything the whole-container version can
the whole-container approach doubles or triples the number of overloads required, while still being less powerful
the range-based algorithms are also composable (you can stack or chain iterator adaptors, although this is more commonly done in functional languages and Python)

There's another valid reason, of course, which is that it was already a lot of work to get the STL standardized, and inflating it with convenience wrappers before it had been widely used wouldn't be a great use of limited committee time. If you're interested, you can find Stepanov & Lee's technical report here

As mentioned in comments, Boost.Range provides a newer approach without requiring changes to the standard.

Related Solutions

C++ Interface Design – What a Double Container Should Offer

Don't expose your guts, guide visitors :

class A { 
    public:

        // we assume you want read-only versions, if not you can add non-const versions
        template< class Func >
        void for_each_primary( Func f ) const { for_each_value( f, m_primary ); }

        template< class Func >
        void for_each_secodary( Func f ) const { for_each_value( f, m_secondary ); }

    private:
        std::vector<int>  m_primary;
        std::vector<char> m_secondary;

        template< class Func, class Container >
        void for_each_value( Func f, const Container& c )
        {
             for( auto i : c )
                 f( i );
        }
};



int main()
{
    A a;
    a.for_each_primary( [&]( int value ) 
         { std::cout << "Primary Value : " << value ; } );
    a.for_each_secondary( [&]( int value ) 
         { std::cout << "Secondary Value : " << value ; } )
}

Note that you could use std::function instead of template parameter if you want to put the implementation in a cpp file, making implementation changes less expensive on compilation times in big projects.

Also, I didn't try to compile it now, but I used a lot this pattern in my open-source projects.

This solution is a C++11 enhancement of B I guess.

HOWEVER

This solution have several issues :

It requires C++11 to be effective, because it's efficient for the user of you class ONLY if he can use lambda.
It relies on the fact that the class implementer really know what algorithms precisely are to be available to users. If the user need to do complex manipulations to the numbers, jumping from index to index in an unpredictable way for example, then exposing iterators, a copy of the values OR the values would be better.

In fact, this kind of choice totally depends on what you intend the user to do with this class.

By default I prefer the solution I gave you because it's the most "isolated" one, making sure the class know how it's values can be manipulated by external code. It's a bit like "extensions points". If it's a map, providing a find function to your class is easy. So I think that's the more sane way to expose data and it's also made available by lambdas.

As said, if you need to make sure the user can manipulate the data as he wish, then providing a copy of the container is the next "isolated" option (maybe with a way to reset the container with the copy after that). If a copy would be expensive, then iterators would be better. If not enough then a reference is acceptable but it's certainly a bad idea.

Now assuming you're using C++11 and don't want to provide algorithms, the most idiomatic way is using iterators this way (only the user code changes) :

class B { 
    private:
        std::vector<int>  m_primary;
        std::vector<char> m_secondary;
    public:

        // your code is read-write enabled... make sure it's not const_iterator you want
        // also I'm using decltypt to allow changing container type without having to manually change functions signatures
        decltype(m_primary)::iterator primary_begin() const;
        decltype(m_primary)::iterator primary_end() const;

        decltype(m_secondary)::iterator secondary_begin() const;
        decltype(m_secondary)::iterator secondary_end() const; 


};

int main()
{
    B b;


    std::for_each( b.primary_begin(), b.primary_end(), []( int& value ) {
        // ...
    });   
    std::for_each( b.secondary_begin(), b.secondary_end(), []( double& value ) {
        // ...
    });   

}

Haskell – Why Isn’t There a Typeclass for Functions?

Well, I don't know of any baked in ideas that market themselves as representing "function-y" things. But there are several that come close

Arrows

If your functions have a notion of products and can inject arbitrary functions, than arrows are for you

 class Arrow a where
   arr :: (b -> c) -> a b c
   first :: a b c -> a (b, d) (c, d)
   second :: a b c -> a (d, b) (d, c)

ArrowApply has a notion of application which looks important for what you want.

Applicatives

Applicatives have your notion of application, I've used them in an AST to represent function application.

class Functor f => Applicative f where
  pure :: a -> f a
  (<*>) :: f (a -> b) -> f b -> f c

There are many other ideas. But a common theme is to build up some data structure representing your function, and than pass it to an interpretation function.

This also how many free monads work. I'd suggest poking at these if you're feeling brave, they're a powerful tool for the stuff that you're suggesting and essentially let you build up a datastructure using do notation and then collapse it into a side effecting computation with different functions. But the beauty is that these functions just operation on the datastructure, and aren't really aware of how you made it all. This is what'd I'd suggest for your example of an interpreter.