Caveat: It is not necessary to put the implementation in the header file, see the alternative solution at the end of this answer.
Anyway, the reason your code is failing is that, when instantiating a template, the compiler creates a new class with the given template argument. For example:
template<typename T>
struct Foo
{
T bar;
void doSomething(T param) {/* do stuff using T */}
};
// somewhere in a .cpp
Foo<int> f;
When reading this line, the compiler will create a new class (let's call it FooInt
), which is equivalent to the following:
struct FooInt
{
int bar;
void doSomething(int param) {/* do stuff using int */}
}
Consequently, the compiler needs to have access to the implementation of the methods, to instantiate them with the template argument (in this case int
). If these implementations were not in the header, they wouldn't be accessible, and therefore the compiler wouldn't be able to instantiate the template.
A common solution to this is to write the template declaration in a header file, then implement the class in an implementation file (for example .tpp), and include this implementation file at the end of the header.
Foo.h
template <typename T>
struct Foo
{
void doSomething(T param);
};
#include "Foo.tpp"
Foo.tpp
template <typename T>
void Foo<T>::doSomething(T param)
{
//implementation
}
This way, implementation is still separated from declaration, but is accessible to the compiler.
Alternative solution
Another solution is to keep the implementation separated, and explicitly instantiate all the template instances you'll need:
Foo.h
// no implementation
template <typename T> struct Foo { ... };
Foo.cpp
// implementation of Foo's methods
// explicit instantiations
template class Foo<int>;
template class Foo<float>;
// You will only be able to use Foo with int or float
If my explanation isn't clear enough, you can have a look at the C++ Super-FAQ on this subject.
typename
and class
are interchangeable in the basic case of specifying a template:
template<class T>
class Foo
{
};
and
template<typename T>
class Foo
{
};
are equivalent.
Having said that, there are specific cases where there is a difference between typename
and class
.
The first one is in the case of dependent types. typename
is used to declare when you are referencing a nested type that depends on another template parameter, such as the typedef
in this example:
template<typename param_t>
class Foo
{
typedef typename param_t::baz sub_t;
};
The second one you actually show in your question, though you might not realize it:
template < template < typename, typename > class Container, typename Type >
When specifying a template template, the class
keyword MUST be used as above -- it is not interchangeable with typename
in this case (note: since C++17 both keywords are allowed in this case).
You also must use class
when explicitly instantiating a template:
template class Foo<int>;
I'm sure that there are other cases that I've missed, but the bottom line is: these two keywords are not equivalent, and these are some common cases where you need to use one or the other.
Best Answer
A common example is sorting.
In C,
qsort
takes a pointer to a comparison function. Generally speaking, there will be one copy of theqsort
code, which is not inlined. It will make a call through the pointer to the comparison routine -- this of course is also not inlined.In C++,
std::sort
is a template, and it can take a functor object as comparator. There is a different copy ofstd::sort
for each different type used as a comparator. Assuming you use a functor class with overloadedoperator()
, then the call to the comparator can easily be inlined into this copy ofstd::sort
.So, templates give you more inlining because there are more copies of the
sort
code, each of which can inline a different comparator. Inlining is quite a good optimization, and sort routines do a lot of comparisons, so you can often measurestd::sort
running faster than an equivalentqsort
. The cost of this is the chance of much larger code -- if your program uses a lot of different comparators then you get a lot of different copies of the sort routine, each with a different comparator baked into it.In principle there's no reason why a C implementation can't inline
qsort
into the place it is called. Then if it was called with the name of the function, the optimizer could in theory observe that at the point it is used, the function pointer must still point to that same function. Then it can inline the call to the function, and the result would be similar to the result withstd::sort
. But in practice, compilers tend not to take the first step, inliningqsort
. That's because (a) it's large, and (b) it's in a different translation unit, usually compiled into some library that your program is linked against, and (c) to do it this way, you'd have an inlined copy ofqsort
for every call to it, not just a copy for every different comparator. So it would be even more bloated than the C++, unless the implementation could also find a way to common up the code in cases whereqsort
is called in different places with the same comparator.So, general-purpose functions like
qsort
in C tend to have some overheads on account of calls through function pointers, or other indirection[*]. Templates in C++ are a common way of keeping the source code generic, but ensuring that it compiles to a special-purpose function (or several such functions). The special-purpose code hopefully is faster.It's worth noting that templates are not by any means just about performance.
std::sort
is itself more general-purpose thanqsort
in some ways. For exampleqsort
only sorts arrays, whereasstd::sort
can sort anything that provides a random-access iterator. It can for example sort adeque
, which under the covers is several disjoint arrays allocated separately. So the use of templates doesn't necessarily provide any performance benefit, it might be done for other reasons. It just happens that templates do affect performance.[*] another example with sorting -
qsort
takes an integer parameter saying how big each element of the array is, and when it moves elements it therefore must callmemcpy
or similar with the value of this variable.std::sort
knows at compile-time the exact type of the elements, and hence the exact size. It can inline a copy constructor call that in turn might translate to instructions to copy that number of bytes. As with the inlined comparator, it's often possible to copy exactly 4 (or 8, or 16, or whatever) bytes faster than you'd get by calling a routine that copies a variable number of bytes, passing it the value 4 (or 8, or 16, or whatever). As before, if you calledqsort
with a literal value for the size, and that call toqsort
was inlined, then the compiler could perform the exact same optimization in C. But in practice you don't see that.