C++ – Why is ‘pure polymorphism’ preferable over using RTTI

cpolymorphismrtti

Almost every C++ resource I've seen that discusses this kind of thing tells me that I should prefer polymorphic approaches to using RTTI (run-time type identification). In general, I take this kind of advice seriously, and will try and understand the rationale — after all, C++ is a mighty beast and hard to understand in its full depth. However, for this particular question, I'm drawing a blank and would like to see what kind of advice the internet can offer. First, let me summarize what I've learned so far, by listing the common reasons that are quoted why RTTI is "considered harmful":

Some compilers don't use it / RTTI is not always enabled

I really don't buy this argument. It's like saying I shouldn't use C++14 features, because there are compilers out there that don't support it. And yet, no one would discourage me from using C++14 features. The majority of projects will have influence over the compiler they're using, and how it's configured. Even quoting the gcc manpage:

-fno-rtti

Disable generation of information about every class with virtual functions for use by the C++ run-time type identification features
(dynamic_cast and typeid). If you don't use those parts of the language, you can save some space by using this flag. Note that exception
handling uses the same information, but G++ generates it as needed. The dynamic_cast operator can still be used for casts that do not require
run-time type information, i.e. casts to "void *" or to unambiguous base classes.

What this tells me is that if I'm not using RTTI, I can disable it. That's like saying, if you're not using Boost, you don't have to link to it. I don't have to plan for the case where someone is compiling with -fno-rtti. Plus, the compiler will fail loud and clear in this case.

It costs extra memory / Can be slow

Whenever I'm tempted to use RTTI, that means I need to access some kind of type information or trait of my class. If I implement a solution that does not use RTTI, this usually means I will have to add some fields to my classes to store this information, so the memory argument is kind of void (I'll give an example of this further down).

A dynamic_cast can be slow, indeed. There's usually ways to avoid having to use it speed-critical situations, though. And I don't quite see the alternative. This SO answer suggests using an enum, defined in the base class, to store the type. That only works if you know all your derived classes a-priori. That's quite a big "if"!

From that answer, it seems also that the cost of RTTI is not clear, either. Different people measure different stuff.

Elegant polymorphic designs will make RTTI unnecessary

This is the kind of advice I take seriously. In this case, I simply can't come up with good non-RTTI solutions that cover my RTTI use case. Let me provide an example:

Say I'm writing a library to handle graphs of some kind of objects. I want to allow users to generate their own types when using my library (so the enum method is not available). I have a base class for my node:

class node_base
{
  public:
    node_base();
    virtual ~node_base();

    std::vector< std::shared_ptr<node_base> > get_adjacent_nodes();
};

Now, my nodes can be of different types. How about these:

class red_node : virtual public node_base
{
  public:
    red_node();
    virtual ~red_node();

    void get_redness();
};

class yellow_node : virtual public node_base
{
  public:
    yellow_node();
    virtual ~yellow_node();

    void set_yellowness(int);
};

Hell, why not even one of these:

class orange_node : public red_node, public yellow_node
{
  public:
    orange_node();
    virtual ~orange_node();

    void poke();
    void poke_adjacent_oranges();
};

The last function is interesting. Here's a way to write it:

void orange_node::poke_adjacent_oranges()
{
    auto adj_nodes = get_adjacent_nodes();
    foreach(auto node, adj_nodes) {
        // In this case, typeid() and static_cast might be faster
        std::shared_ptr<orange_node> o_node = dynamic_cast<orange_node>(node);
        if (o_node) {
             o_node->poke();
        }
    }
}

This all seems clear and clean. I don't have to define attributes or methods where I don't need them, the base node class can stay lean and mean. Without RTTI, where do I start? Maybe I can add a node_type attribute to the base class:

class node_base
{
  public:
    node_base();
    virtual ~node_base();

    std::vector< std::shared_ptr<node_base> > get_adjacent_nodes();

  private:
    std::string my_type;
};

Is std::string a good idea for a type? Maybe not, but what else can I use? Make up a number and hope no one else is using it yet? Also, in the case of my orange_node, what if I want to use the methods from red_node and yellow_node? Would I have to store multiple types per node? That seems complicated.

Conclusion

This examples doesn't seem overly complex or unusual (I'm working on something similar in my day job, where the nodes represent actual hardware that gets controlled through the software, and which do very different thing depending on what they are). Yet I wouldn't know a clean way of doing this with templates or other methods.
Please note that I'm trying to understand the problem, not defend my example. My reading of pages such as the SO answer I linked above and this page on Wikibooks seem to suggest I'm misusing RTTI, but I would like to learn why.

So, back to my original question: Why is 'pure polymorphism' preferable over using RTTI?

Best Answer

An interface describes what one needs to know in order to interact in a given situation in code. Once you extend the interface with "your entire type hierarchy", your interface "surface area" becomes huge, which makes reasoning about it harder.

As an example, your "poke adjacent oranges" means that I, as a 3rd party, cannot emulate being an orange! You privately declared an orange type, then use RTTI to make your code behave special when interacting with that type. If I want to "be orange", I must be within your private garden.

Now everyone who couples with "orangeness" couples with your entire orange type, and implicitly with your entire private garden, instead of with a defined interface.

While at first glance this looks like a great way to extend the limited interface without having to change all clients (adding am_I_orange), what tends to happen instead is it ossifies the code base, and prevents further extension. The special orangeness becomes inherent to the functioning of the system, and prevents you from creating a "tangerine" replacement for orange that is implemented differently and maybe removes a dependency or solves some other problem elegantly.

This does mean your interface has to be sufficient to solve your problem. From that perspective, why do you need to only poke oranges, and if so why was orangeness unavailable in the interface? If you need some fuzzy set of tags that can be added ad-hoc, you could add that to your type:

class node_base {
  public:
    bool has_tag(tag_name);

This provides a similar massive broadening of your interface from narrowly specified to broad tag-based. Except instead of doing it through RTTI and implementation details (aka, "how are you implemented? With the orange type? Ok you pass."), it does so with something easily emulated through a completely different implementation.

This can even be extended to dynamic methods, if you need that. "Do you support being Foo'd with arguments Baz, Tom and Alice? Ok, Fooing you." In a big sense, this is less intrusive than a dynamic cast to get at the fact the other object is a type you know.

Now tangerine objects can have the orange tag and play along, while being implementation-decoupled.

It can still lead to a huge mess, but it is at least a mess of messages and data, not implementation hierarchies.

Abstraction is a game of decoupling and hiding irrelevancies. It makes code easier to reason about locally. RTTI is boring a hole straight through the abstraction into implementation details. This can make solving a problem easier, but it has the cost of locking you into one specific implementation really easily.

The "typename" keyword

The answer is: We decide how the compiler should parse this. If t::x is a dependent name, then we need to prefix it by typename to tell the compiler to parse it in a certain way. The Standard says at (14.6/2):

A name used in a template declaration or definition and that is dependent on a template-parameter is assumed not to name a type unless the applicable name lookup finds a type name or the name is qualified by the keyword typename.

There are many names for which typename is not necessary, because the compiler can, with the applicable name lookup in the template definition, figure out how to parse a construct itself - for example with T *f;, when T is a type template parameter. But for t::x * f; to be a declaration, it must be written as typename t::x *f;. If you omit the keyword and the name is taken to be a non-type, but when instantiation finds it denotes a type, the usual error messages are emitted by the compiler. Sometimes, the error consequently is given at definition time:

// t::x is taken as non-type, but as an expression the following misses an
// operator between the two names or a semicolon separating them.
t::x f;

The syntax allows typename only before qualified names - it is therefor taken as granted that unqualified names are always known to refer to types if they do so.

A similar gotcha exists for names that denote templates, as hinted at by the introductory text.

The "template" keyword

Remember the initial quote above and how the Standard requires special handling for templates as well? Let's take the following innocent-looking example:

boost::function< int() > f;

It might look obvious to a human reader. Not so for the compiler. Imagine the following arbitrary definition of boost::function and f:

namespace boost { int function = 0; }
int main() { 
  int f = 0;
  boost::function< int() > f; 
}

That's actually a valid expression! It uses the less-than operator to compare boost::function against zero (int()), and then uses the greater-than operator to compare the resulting bool against f. However as you might well know, boost::function in real life is a template, so the compiler knows (14.2/3):

After name lookup (3.4) finds that a name is a template-name, if this name is followed by a <, the < is always taken as the beginning of a template-argument-list and never as a name followed by the less-than operator.

Now we are back to the same problem as with typename. What if we can't know yet whether the name is a template when parsing the code? We will need to insert template immediately before the template name, as specified by 14.2/4. This looks like:

t::template f<int>(); // call a function template

Template names can not only occur after a :: but also after a -> or . in a class member access. You need to insert the keyword there too:

this->template f<int>(); // call a function template

Dependencies

For the people that have thick Standardese books on their shelf and that want to know what exactly I was talking about, I'll talk a bit about how this is specified in the Standard.

In template declarations some constructs have different meanings depending on what template arguments you use to instantiate the template: Expressions may have different types or values, variables may have different types or function calls might end up calling different functions. Such constructs are generally said to depend on template parameters.

The Standard defines precisely the rules by whether a construct is dependent or not. It separates them into logically different groups: One catches types, another catches expressions. Expressions may depend by their value and/or their type. So we have, with typical examples appended:

Dependent types (e.g: a type template parameter T)
Value-dependent expressions (e.g: a non-type template parameter N)
Type-dependent expressions (e.g: a cast to a type template parameter (T)0)

Most of the rules are intuitive and are built up recursively: For example, a type constructed as T[N] is a dependent type if N is a value-dependent expression or T is a dependent type. The details of this can be read in section (14.6.2/1) for dependent types, (14.6.2.2) for type-dependent expressions and (14.6.2.3) for value-dependent expressions.

Dependent names

The Standard is a bit unclear about what exactly is a dependent name. On a simple read (you know, the principle of least surprise), all it defines as a dependent name is the special case for function names below. But since clearly T::x also needs to be looked up in the instantiation context, it also needs to be a dependent name (fortunately, as of mid C++14 the committee has started to look into how to fix this confusing definition).

To avoid this problem, I have resorted to a simple interpretation of the Standard text. Of all the constructs that denote dependent types or expressions, a subset of them represent names. Those names are therefore "dependent names". A name can take different forms - the Standard says:

A name is a use of an identifier (2.11), operator-function-id (13.5), conversion-function-id (12.3.2), or template-id (14.2) that denotes an entity or label (6.6.4, 6.1)

An identifier is just a plain sequence of characters / digits, while the next two are the operator + and operator type form. The last form is template-name <argument list>. All these are names, and by conventional use in the Standard, a name can also include qualifiers that say what namespace or class a name should be looked up in.

A value dependent expression 1 + N is not a name, but N is. The subset of all dependent constructs that are names is called dependent name. Function names, however, may have different meaning in different instantiations of a template, but unfortunately are not caught by this general rule.

Dependent function names

Not primarily a concern of this article, but still worth mentioning: Function names are an exception that are handled separately. An identifier function name is dependent not by itself, but by the type dependent argument expressions used in a call. In the example f((T)0), f is a dependent name. In the Standard, this is specified at (14.6.2/1).

Additional notes and examples

In enough cases we need both of typename and template. Your code should look like the following

template <typename T, typename Tail>
struct UnionNode : public Tail {
    // ...
    template<typename U> struct inUnion {
        typedef typename Tail::template inUnion<U> dummy;
    };
    // ...
};

The keyword template doesn't always have to appear in the last part of a name. It can appear in the middle before a class name that's used as a scope, like in the following example

typename t::template iterator<int>::value_type v;

In some cases, the keywords are forbidden, as detailed below

On the name of a dependent base class you are not allowed to write typename. It's assumed that the name given is a class type name. This is true for both names in the base-class list and the constructor initializer list:
```
 template <typename T>
 struct derive_from_Has_type : /* typename */ SomeBase<T>::type 
 { };
```

In using-declarations it's not possible to use template after the last ::, and the C++ committee said not to work on a solution.

 template <typename T>
 struct derive_from_Has_type : SomeBase<T> {
    using SomeBase<T>::template type; // error
    using typename SomeBase<T>::type; // typename *is* allowed
 };

C++ – Why is “using namespace std;” considered bad practice

This is not related to performance at all. But consider this: you are using two libraries called Foo and Bar:

using namespace foo;
using namespace bar;

Everything works fine, and you can call Blah() from Foo and Quux() from Bar without problems. But one day you upgrade to a new version of Foo 2.0, which now offers a function called Quux(). Now you've got a conflict: Both Foo 2.0 and Bar import Quux() into your global namespace. This is going to take some effort to fix, especially if the function parameters happen to match.

If you had used foo::Blah() and bar::Quux(), then the introduction of foo::Quux() would have been a non-event.