Who is to blame for this range based for over a reference to temporary

c++11

The following code looks rather harmless on first sight. A user uses the function bar() to interact with some library functionality. (This may have even worked for a long time since bar() returned a reference to a non-temporary value or similar.) Now however it is simply returning a new instance of B. B again has a function a() that returns a reference to an object of the iterateable type A. The user wants to query this object which leads to a segfault since the temporary B object returned by bar() is destroyed before the iteration begins.

I am indecisive who (library or user) is to blame for this.
All library provided classes look clean to me and certainly aren't doing anything different (returning references to members, returning stack instances, …) than so much other code out there does.
The user doesn't seem to do anything wrong as well, he's just iterating over some object without doing anything concerning that objects lifetime.

(A related question might be: Should one establish the general rule that code shouldn't "range-based-for-iterate" over something that is retrieved by more than one chained calls in the loop header since any of these calls might return a rvalue?)

#include <algorithm>
#include <iostream>

// "Library code"
struct A
{
    A():
        v{0,1,2}
    {
        std::cout << "A()" << std::endl;
    }

    ~A()
    {
        std::cout << "~A()" << std::endl;
    }

    int * begin()
    {
        return &v[0];
    }

    int * end()
    {
        return &v[3];
    }

    int v[3];
};

struct B
{
    A m_a;

    A & a()
    {
        return m_a;
    }
};

B bar()
{
    return B();
}

// User code
int main()
{
    for( auto i : bar().a() )
    {
        std::cout << i << std::endl;
    }
}

Best Answer

I think the fundamental problem is a combination of language features (or lack thereof) of C++. Both the library code and the client code are reasonable (as evidenced by the fact that the problem is far from obvious). If the lifetime of the temporary B was suitable extended (to the end of the loop) there would be no problem.

Making temporaries life just long enough, and no longer, is extremely hard. Not even a rather ad-hoc "all temporaries involved in the creation of the range for a range-based for live until the end of the loop" would be without side effects. Consider the case of B::a() returning a range that's independent of the B object by value. Then the temporary B can be discarded immediately. Even if one could precisely identify the cases where a lifetime extension is necessary, as these cases are not obvious to programmers, the effect (destructors called much later) would be surprising and perhaps an equally subtle source of bugs.

It would be more desirable to just detect and forbid such nonsense, forcing the programmer to explicitly elevate bar() to a local variable. This is not possible in C++11, and probably never will be possible because it requires annotations. Rust does this, where the signature of .a() would be:

fn a<'x>(bar: &'x B) -> &'x A { bar.a }
// If we make it as explicit as possible, or
fn a(&self) -> &A { self.a }
// if we make it a method and rely on lifetime elision.

Here 'x is a lifetime variable or region, which is a symbolic name for the period of time a resource is available. Frankly, lifetimes are hard to explain -- or we haven't yet figured out the best explanation -- so I will restrict myself to the minimum necessary for this example and refer the inclined reader to the official documentation.

The borrow checker would notice that the result of bar().a() needs to live as long as the loop runs. Phrased as a constraint on the lifetime 'x, we write: 'loop <= 'x. It would also notice that the receiver of the method call, bar(), is a temporary. The two pointers are associated with the same lifetime, hence 'x <= 'temp is another constraint.

These two constraints are contradictory! We need 'loop <= 'x <= 'temp but 'temp <= 'loop, which captures the problem pretty precisely. Because of the conflicting requirements, the buggy code is rejected. Note that this is a compile-time check and Rust code usually results to the same machine code as equivalent C++ code, so you need not pay a run-time cost for it.

Nevertheless this is a big feature to add to a language, and only works if all code uses it. the design of APIs is also affected (some designs that would be too dangerous in C++ become practical, others can't be made to play nice with lifetimes). Alas, that means it's not practical to add to C++ (or any language really) retroactively. In summary, the fault is on the inertia successful languages have and the fact that Bjarne in 1983 didn't have the crystal ball and foresight to incorporate the lessons of the last 30 years of research and C++ experience ;-)

Of course, that's not at all helpful in avoiding the problem in the future (unless you switch to Rust and never use C++ again). One could avoid longer expressions with multiple chained method calls (which is pretty limiting, and doesn't even remotely fix all lifetime troubles). Or one could try adopting a more disciplined ownership policy without compiler assistance: Document clearly that bar returns by value and that the result of B::a() must not outlive the B on which a() is invoked. When changing a function to return by value instead of a longer-lived reference, be conscious that this is a change of contract. Still error prone, but may speed up the process of identifying the cause when it does happen.

Related Topic