Yes, Virginia, there is a Santa Claus.
The notion of using programs to modify programs has been around a long time. The original idea came from John von Neumann in the form of stored-program computers. But machine code modifying machine code in arbitrary ways is pretty inconvenient.
People generally want to modify source code. This is mostly realized in the form of program transformation systems (PTS).
PTS generally offer, for at least one programming language, the ability to parse to ASTs, manipulate that AST, and regenerate valid source text. If in fact you dig around, for most mainstream languages, somebody has built such a tool (Clang is an example for C++, the Java compiler offers this capability as an API, Microsoft offers Rosyln, Eclipse's JDT, ...) with a procedural API that is actually pretty useful. For the broader community, almost every language-specific community can point to something like this, implemented with various levels of maturity (usually modest, many "just parsers producing ASTs"). Happy metaprogramming.
[There's a reflection-oriented community that tries to do metaprogramming from inside the programming language, but only achieve "runtime" behaviour modifiation, and only to the extent that the language compilers made some information available by reflection. With the exception of LISP, there are always details about the program that are not available by reflection ("Luke, you need the source") that always limit what reflection can do.]
The more interesting PTS do this for arbitrary languages (you give the tool a language description as a configuration parameter, including at a minimum the BNF). Such PTS also allow you to do "source to source" transformation, e.g., specify patterns directly using the surface syntax of the targeted language; using such patterns, you can code fragments of interest, and/or find and replace code fragments. This is far more convenient than the programming API, because you don't have to know every microscopic details about the ASTs to do most of your work. Think of this as meta-metaprogramming :-}
A downside: unless the PTS offers various kinds of useful static analyses (symbol tables, control and data flow analyses), it is hard to write really interesting transformations this way, because you need to check types and verify information flows for most practical tasks. Unfortunately, this capability is in fact rare in the general PTS. (It is always unavailable with the ever-proposed "If I just had a parser... " See my bio for a longer discussion of "Life After Parsing").
There's a theorem that says if you can do string rewriting [thus tree rewriting] you can do arbitrary transformation; and thus a number of PTS lean on this to claim you can metaprogram anything with just the tree rewrites they offer. While the theorem is satisfying in the sense you are now sure you can do anything, it is unsatisfying in the same way that a Turing Machine's ability to do anything doesn't make programming a Turing Machine the method of choice. (The same holds true for systems with just procedural APIs, if they will let you make arbitrary changes to the AST [and in fact I think this is not true of Clang]).
What you want is the best of both worlds, a system that offers you the generality of the language-parameterized type of PTS (even handling multiple languages), with the additional static analyses, the ability to mix source-to-source transformations with procedural APIs. I only know of two that do this:
- Rascal (MPL) MetaProgramming Language
- our DMS Software Reengineering Toolkit
Unless you want the write the language descriptions and static analyzers yourself (for C++ this is a tremendous amount of work, which is why Clang was constructed both as a compiler and as general procedural metaprogramming foundation), you will want a PTS with mature language descriptions already available. Otherwise you will spend all your time configuring the PTS, and none doing the work you actually wanted to do. [If you pick a random, non-mainstream language, this step is very hard to avoid].
Rascal tries to do this by co-opting "OPP" (Other People's Parsers) but that doesnt help with the static analysis part. I think they have Java pretty well in hand, but I'm very sure they don't do C or C++. But, its a academic research tool; hard to blame them.
I emphasize, our [commercial] DMS tool does have Java, C, C++ full front ends available. For C++, it covers almost everything in C++14 for GCC and even Microsoft's variations (and we are polishing now), macro expansion and conditional management, and method-level control and data flow analysis. And yes, you can specify grammar changes in a practical way; we built a custom VectorC++ system for a client that radically extended C++ to use what amount to F90/APL data-parallel array operations. DMS has been used to carry out other massive metaprogramming tasks on large C++ systems (e.g., application architectural reshaping). (I am the architect behind DMS).
Happy meta-metaprogramming.
I think the fundamental problem is a combination of language features (or lack thereof) of C++. Both the library code and the client code are reasonable (as evidenced by the fact that the problem is far from obvious). If the lifetime of the temporary B
was suitable extended (to the end of the loop) there would be no problem.
Making temporaries life just long enough, and no longer, is extremely hard. Not even a rather ad-hoc "all temporaries involved in the creation of the range for a range-based for live until the end of the loop" would be without side effects. Consider the case of B::a()
returning a range that's independent of the B
object by value. Then the temporary B
can be discarded immediately. Even if one could precisely identify the cases where a lifetime extension is necessary, as these cases are not obvious to programmers, the effect (destructors called much later) would be surprising and perhaps an equally subtle source of bugs.
It would be more desirable to just detect and forbid such nonsense, forcing the programmer to explicitly elevate bar()
to a local variable. This is not possible in C++11, and probably never will be possible because it requires annotations. Rust does this, where the signature of .a()
would be:
fn a<'x>(bar: &'x B) -> &'x A { bar.a }
// If we make it as explicit as possible, or
fn a(&self) -> &A { self.a }
// if we make it a method and rely on lifetime elision.
Here 'x
is a lifetime variable or region, which is a symbolic name for the period of time a resource is available. Frankly, lifetimes are hard to explain -- or we haven't yet figured out the best explanation -- so I will restrict myself to the minimum necessary for this example and refer the inclined reader to the official documentation.
The borrow checker would notice that the result of bar().a()
needs to live as long as the loop runs. Phrased as a constraint on the lifetime 'x
, we write: 'loop <= 'x
. It would also notice that the receiver of the method call, bar()
, is a temporary. The two pointers are associated with the same lifetime, hence 'x <= 'temp
is another constraint.
These two constraints are contradictory! We need 'loop <= 'x <= 'temp
but 'temp <= 'loop
, which captures the problem pretty precisely. Because of the conflicting requirements, the buggy code is rejected. Note that this is a compile-time check and Rust code usually results to the same machine code as equivalent C++ code, so you need not pay a run-time cost for it.
Nevertheless this is a big feature to add to a language, and only works if all code uses it. the design of APIs is also affected (some designs that would be too dangerous in C++ become practical, others can't be made to play nice with lifetimes). Alas, that means it's not practical to add to C++ (or any language really) retroactively. In summary, the fault is on the inertia successful languages have and the fact that Bjarne in 1983 didn't have the crystal ball and foresight to incorporate the lessons of the last 30 years of research and C++ experience ;-)
Of course, that's not at all helpful in avoiding the problem in the future (unless you switch to Rust and never use C++ again). One could avoid longer expressions with multiple chained method calls (which is pretty limiting, and doesn't even remotely fix all lifetime troubles). Or one could try adopting a more disciplined ownership policy without compiler assistance: Document clearly that bar
returns by value and that the result of B::a()
must not outlive the B
on which a()
is invoked. When changing a function to return by value instead of a longer-lived reference, be conscious that this is a change of contract. Still error prone, but may speed up the process of identifying the cause when it does happen.
Best Answer
No, it is never bad form to break out of any loop early unless you are aware of an invariant that will always break early. I have seen loops process the first element only:
This code should never be used, unfortunately I have seen it in production code several times in the past. Of course this prompted a code review.
Anyway, this does not appear to be your case. I would still reconsider the design of your algorithm though. Why break out of the loop just because one element somewhere in the container fails a check? Why not skip that element? Maybe looking at the big picture will reveal a better way of doing whatever you are trying to do. It is difficult for me to give more guidance without more of the code and the overall design.