C++ Metaprogramming – Using a Compiler API Instead of C++ Features

cc++11clangmeta-programming

This started out as a SO question but I realized that it is quite unconventional and based on the actual description on the websites, it might be better suited to programmers.se since the question has a lot of conceptual weight.

I have been learning clang LibTooling and it is a very powerful tool capable of exposing the entire "nitty gritty" of the code in a friendly way, that is, in a semantic way, and not by guessing either. If clang can compile your code, then clang is certain about the semantics of every single character inside that code.

Now allow me to step back for a moment.

There are many practical problems that arise when one engages in C++ template metaprogramming (and especially when venturing beyond templates into the territory of clever albeit terrifying macros). To be honest, to many programmers, myself included, many of the ordinary uses of templates are also somewhat terrifying.

I guess a good example would be compile-time strings. This is a question that is over a year old now, but it is clear that C++ as of right now does not make this easy for mere mortals. While looking at these options isn't quite enough to induce nausea for me, it nevertheless leaves me unconfident about being able to produce magical, maximally efficient machine code to suit whatever fancy application I have for my software.

I mean, let's face it, folks, strings are pretty simple and basic. Some of us just want a convenient way to emit machine code that has certain strings "baked in" significantly more than we do get when coding it the straightforward way. In our C++ code.

Enter clang and LibTooling, which exposes the abstract syntax tree (AST) of the source code and allows a simple custom C++ application to correctly and reliably manipulate raw source code (using Rewriter) alongside a rich semantic object-oriented model of everything in the AST. It handles a lot of things. It knows about the macro expansions, and lets you follow those chains. Yes, I am talking about source-to-source code transformation or translation.

My fundamental thesis here is that clang now enables us to create executables which themselves can function as the ideal custom preprocessor stages to our C++ software, and we can implement these metaprogramming stages with C++. We are simply constrained by the fact that this stage must take input which is valid C++ code and produce as output more valid C++ code. Plus whatever other constraints your build system applies.

The input has to be at least very close to valid C++ code because, after all, clang is the compiler front-end and we are just poking around and being creative with its API. I do not know if there is any provision for being able to define new syntax to use, but clearly we have to develop the ways to properly parse it and add it to the clang project in order to do this. To expect any more is to have something in the clang project that is out of scope.

Not a problem. I would imagine that some no-op macro functions can handle this task.

Another way to look at what I'm describing is to implement metaprogramming constructs using runtime C++ by manipulating the AST of our source code (thanks to clang and its API) instead of implementing them using the more limited tools available in the language itself. This has clear compilation performance benefits as well (template-heavy headers slow compilation proportionally to how often you use them. Lots of compiled stuff then gets carefully matched up and thrown away by the linker).

This does, however, come at the cost of introducing an additional step or two in the build process and also in the requirement of writing some (admittedly) somewhat more verbose software (but at least it is straightforward runtime C++) as part of our tool.

That isn't the whole picture. I am pretty certain that there is a much larger space of functionality that can be had from generating code that is extremely difficult or impossible with core language features. In C++ you can write a template or a macro or a crazy combination of both, but in a clang tool you can modify classes and functions in ANY way that you can achieve with C++, at runtime, while having full access to the semantic content, in addition to template and macros and everything else.

So, I'm wondering about why everybody isn't already doing this. Is it that this functionality from clang is so new and nobody is familiar with the huge class hierarchy of clang's AST? That can't be it.

Perhaps I am just underestimating the difficulty of this a little bit, but doing "compile-time string manipulation" with a clang tool is nearly criminally simple. It's verbose, but it's insanely straightforward. All that's needed are a bunch of no-op macro functions that map to actual real std::string operations. The clang plugin implements this by fetching all the relevant no-op macro calls, and performs the operations with strings. This tool is then inserted as a part of the build process. During build, these no-op macro function calls are automatically evaluated into their results, and then inserted back as plain old compile-time strings in the program. The program can then be compiled as usual. In fact this resulting program is also much more portable as a result, not requiring a fancy new compiler supporting C++11.

Best Answer

Yes, Virginia, there is a Santa Claus.

The notion of using programs to modify programs has been around a long time. The original idea came from John von Neumann in the form of stored-program computers. But machine code modifying machine code in arbitrary ways is pretty inconvenient.

People generally want to modify source code. This is mostly realized in the form of program transformation systems (PTS).

PTS generally offer, for at least one programming language, the ability to parse to ASTs, manipulate that AST, and regenerate valid source text. If in fact you dig around, for most mainstream languages, somebody has built such a tool (Clang is an example for C++, the Java compiler offers this capability as an API, Microsoft offers Rosyln, Eclipse's JDT, ...) with a procedural API that is actually pretty useful. For the broader community, almost every language-specific community can point to something like this, implemented with various levels of maturity (usually modest, many "just parsers producing ASTs"). Happy metaprogramming.

[There's a reflection-oriented community that tries to do metaprogramming from inside the programming language, but only achieve "runtime" behaviour modifiation, and only to the extent that the language compilers made some information available by reflection. With the exception of LISP, there are always details about the program that are not available by reflection ("Luke, you need the source") that always limit what reflection can do.]

The more interesting PTS do this for arbitrary languages (you give the tool a language description as a configuration parameter, including at a minimum the BNF). Such PTS also allow you to do "source to source" transformation, e.g., specify patterns directly using the surface syntax of the targeted language; using such patterns, you can code fragments of interest, and/or find and replace code fragments. This is far more convenient than the programming API, because you don't have to know every microscopic details about the ASTs to do most of your work. Think of this as meta-metaprogramming :-}

A downside: unless the PTS offers various kinds of useful static analyses (symbol tables, control and data flow analyses), it is hard to write really interesting transformations this way, because you need to check types and verify information flows for most practical tasks. Unfortunately, this capability is in fact rare in the general PTS. (It is always unavailable with the ever-proposed "If I just had a parser... " See my bio for a longer discussion of "Life After Parsing").

There's a theorem that says if you can do string rewriting [thus tree rewriting] you can do arbitrary transformation; and thus a number of PTS lean on this to claim you can metaprogram anything with just the tree rewrites they offer. While the theorem is satisfying in the sense you are now sure you can do anything, it is unsatisfying in the same way that a Turing Machine's ability to do anything doesn't make programming a Turing Machine the method of choice. (The same holds true for systems with just procedural APIs, if they will let you make arbitrary changes to the AST [and in fact I think this is not true of Clang]).

What you want is the best of both worlds, a system that offers you the generality of the language-parameterized type of PTS (even handling multiple languages), with the additional static analyses, the ability to mix source-to-source transformations with procedural APIs. I only know of two that do this:

  • Rascal (MPL) MetaProgramming Language
  • our DMS Software Reengineering Toolkit

Unless you want the write the language descriptions and static analyzers yourself (for C++ this is a tremendous amount of work, which is why Clang was constructed both as a compiler and as general procedural metaprogramming foundation), you will want a PTS with mature language descriptions already available. Otherwise you will spend all your time configuring the PTS, and none doing the work you actually wanted to do. [If you pick a random, non-mainstream language, this step is very hard to avoid].

Rascal tries to do this by co-opting "OPP" (Other People's Parsers) but that doesnt help with the static analysis part. I think they have Java pretty well in hand, but I'm very sure they don't do C or C++. But, its a academic research tool; hard to blame them.

I emphasize, our [commercial] DMS tool does have Java, C, C++ full front ends available. For C++, it covers almost everything in C++14 for GCC and even Microsoft's variations (and we are polishing now), macro expansion and conditional management, and method-level control and data flow analysis. And yes, you can specify grammar changes in a practical way; we built a custom VectorC++ system for a client that radically extended C++ to use what amount to F90/APL data-parallel array operations. DMS has been used to carry out other massive metaprogramming tasks on large C++ systems (e.g., application architectural reshaping). (I am the architect behind DMS).

Happy meta-metaprogramming.

Related Topic