C++ – Why would one replace default new and delete operators

cc++-faqdelete-operatornew-operatoroperator-overloading

Why ~~should~~ would one replace the default operator new and delete with a custom new and delete operators?

This is in continuation of Overloading new and delete in the immensely illuminating C++ FAQ:
Operator overloading.

An followup entry to this FAQ is:
How should I write ISO C++ standard conformant custom new and delete operators?

_{Note: The answer is based on lessons from Scott Meyers' More Effective C++.

_(Note: This is meant to be an entry to [Stack Overflow's C++ FAQ](https://stackoverflow.com/questions/tagged/c++-faq). If you want to critique the idea of providing an FAQ in this form, then [the posting on meta that started all this](https://meta.stackexchange.com/questions/68647/setting-up-a-faq-for-the-c-tag) would be the place to do that. Answers to that question are monitored in the [C++ chatroom](https://chat.stackoverflow.com/rooms/10/c-lounge), where the FAQ idea started out in the first place, so your answer is very likely to get read by those who came up with the idea.)_}

Best Answer

One may try to replace new and delete operators for a number of reasons, namely:

To Detect Usage Errors:

There are a number of ways in which incorrect usage of new and delete may lead to the dreaded beasts of Undefined Behavior & Memory leaks. Respective examples of each are:
Using more than one delete on newed memory & not calling delete on memory allocated using new.
An overloaded operator new can keep a list of allocated addresses and the overloaded operator delete can remove addresses from the list, then it is easy to detect such usage errors.

Similarly, a variety of programming mistakes can lead to data overruns(writing beyond the end of an allocated block) and underruns(writing prior to the beginning of an allocated block).
An Overloaded operator new can over-allocate blocks and put known byte patterns ("signatures") before and after the memory made available to clients. The overloaded operator deletes can check to see if the signatures are still intact. Thus by checking if these signatures are not intact it is possible to determine that an overrun or under-run occurred sometime during the life of the allocated block, and operator delete can log that fact, along with the value of the offending pointer, thus helping in providing a good diagnostic information.

To Improve Efficiency(speed & memory):

The new and delete operators work reasonably well for everybody, but optimally for nobody. This behavior arises from the fact that they are designed for general purpose use only. They have to accommodate allocation patterns ranging from the dynamic allocation of a few blocks that exist for the duration of the program to constant allocation and deallocation of a large number of short-lived objects. Eventually, the operator new and operator delete that ship with compilers take a middle-of-the-road strategy.

If you have a good understanding of your program's dynamic memory usage patterns, you can often find that custom versions of operator new and operator delete outperform (faster in performance, or require less memory up to 50%)the default ones. Of course, unless you are sure of what you are doing it is not a good idea to do this(don't even try this if you don't understand the intricacies involved).

To Collect Usage Statistics:

Before thinking of replacing new and delete for improving efficiency as mentioned in #2, You should gather information about how your application/program uses dynamic allocation. You may want to collect information about:
Distribution of allocation blocks,
Distribution of lifetimes,
Order of allocations(FIFO or LIFO or random),
Understanding usage patterns changes over a period of time,maximum amount of dynamic memory used etc.

Also, sometimes you may need to collect usage information such as:
Count the number of dynamically objects of a class,
Restrict the number of objects being created using dynamic allocation etc.

All, this information can be collected by replacing the custom new and delete and adding the diagnostic collection mechanism in the overloaded new and delete.

To compensate for suboptimal memory alignment in `new`:

Many computer architectures require that data of particular types be placed in memory at particular kinds of addresses. For example, an architecture might require that pointers occur at addresses that are a multiple of four (i.e., be four-byte aligned) or that doubles must occur at addresses that are a multiple of eight (i.e., be eight-byte aligned). Failure to follow such constraints can lead to hardware exceptions at run-time. Other architectures are more forgiving, and may allow it to work though reducing the performance.The operator new that ship with some compilers don't guarantee eight-byte alignment for dynamic allocations of doubles. In such cases, replacing the default operator new with one that guarantees eight-byte alignment could yield big increases in program performance & can be a good reason to replace new and delete operators.

To cluster related objects near one another:

If you know that particular data structures are generally used together and you'd like to minimize the frequency of page faults when working on the data, it can make sense to create a separate heap for the data structures so they are clustered together on as few pages as possible. custom Placement versions of new and delete can make it possible to achieve such clustering.

To obtain unconventional behavior:

Sometimes you want operators new and delete to do something that the compiler-provided versions don't offer.
For example: You might write a custom operator delete that overwrites deallocated memory with zeros in order to increase the security of application data.

Related Solutions

C++ – a smart pointer and when should I use one

UPDATE

This answer is rather old, and so describes what was 'good' at the time, which was smart pointers provided by the Boost library. Since C++11, the standard library has provided sufficient smart pointers types, and so you should favour the use of std::unique_ptr, std::shared_ptr and std::weak_ptr.

There was also std::auto_ptr. It was very much like a scoped pointer, except that it also had the "special" dangerous ability to be copied — which also unexpectedly transfers ownership.
It was deprecated in C++11 and removed in C++17, so you shouldn't use it.

std::auto_ptr<MyObject> p1 (new MyObject());
std::auto_ptr<MyObject> p2 = p1; // Copy and transfer ownership. 
                                 // p1 gets set to empty!
p2->DoSomething(); // Works.
p1->DoSomething(); // Oh oh. Hopefully raises some NULL pointer exception.

OLD ANSWER

A smart pointer is a class that wraps a 'raw' (or 'bare') C++ pointer, to manage the lifetime of the object being pointed to. There is no single smart pointer type, but all of them try to abstract a raw pointer in a practical way.

Smart pointers should be preferred over raw pointers. If you feel you need to use pointers (first consider if you really do), you would normally want to use a smart pointer as this can alleviate many of the problems with raw pointers, mainly forgetting to delete the object and leaking memory.

With raw pointers, the programmer has to explicitly destroy the object when it is no longer useful.

// Need to create the object to achieve some goal
MyObject* ptr = new MyObject(); 
ptr->DoSomething(); // Use the object in some way
delete ptr; // Destroy the object. Done with it.
// Wait, what if DoSomething() raises an exception...?

A smart pointer by comparison defines a policy as to when the object is destroyed. You still have to create the object, but you no longer have to worry about destroying it.

SomeSmartPtr<MyObject> ptr(new MyObject());
ptr->DoSomething(); // Use the object in some way.

// Destruction of the object happens, depending 
// on the policy the smart pointer class uses.

// Destruction would happen even if DoSomething() 
// raises an exception

The simplest policy in use involves the scope of the smart pointer wrapper object, such as implemented by boost::scoped_ptr or std::unique_ptr.

void f()
{
    {
       std::unique_ptr<MyObject> ptr(new MyObject());
       ptr->DoSomethingUseful();
    } // ptr goes out of scope -- 
      // the MyObject is automatically destroyed.

    // ptr->Oops(); // Compile error: "ptr" not defined
                    // since it is no longer in scope.
}

Note that std::unique_ptr instances cannot be copied. This prevents the pointer from being deleted multiple times (incorrectly). You can, however, pass references to it around to other functions you call.

std::unique_ptrs are useful when you want to tie the lifetime of the object to a particular block of code, or if you embedded it as member data inside another object, the lifetime of that other object. The object exists until the containing block of code is exited, or until the containing object is itself destroyed.

A more complex smart pointer policy involves reference counting the pointer. This does allow the pointer to be copied. When the last "reference" to the object is destroyed, the object is deleted. This policy is implemented by boost::shared_ptr and std::shared_ptr.

void f()
{
    typedef std::shared_ptr<MyObject> MyObjectPtr; // nice short alias
    MyObjectPtr p1; // Empty

    {
        MyObjectPtr p2(new MyObject());
        // There is now one "reference" to the created object
        p1 = p2; // Copy the pointer.
        // There are now two references to the object.
    } // p2 is destroyed, leaving one reference to the object.
} // p1 is destroyed, leaving a reference count of zero. 
  // The object is deleted.

Reference counted pointers are very useful when the lifetime of your object is much more complicated, and is not tied directly to a particular section of code or to another object.

There is one drawback to reference counted pointers — the possibility of creating a dangling reference:

// Create the smart pointer on the heap
MyObjectPtr* pp = new MyObjectPtr(new MyObject())
// Hmm, we forgot to destroy the smart pointer,
// because of that, the object is never destroyed!

Another possibility is creating circular references:

struct Owner {
   std::shared_ptr<Owner> other;
};

std::shared_ptr<Owner> p1 (new Owner());
std::shared_ptr<Owner> p2 (new Owner());
p1->other = p2; // p1 references p2
p2->other = p1; // p2 references p1

// Oops, the reference count of of p1 and p2 never goes to zero!
// The objects are never destroyed!

To work around this problem, both Boost and C++11 have defined a weak_ptr to define a weak (uncounted) reference to a shared_ptr.

C++ – Where and why do I have to put the “template” and “typename” keywords

(See here also for my C++11 answer)

In order to parse a C++ program, the compiler needs to know whether certain names are types or not. The following example demonstrates that:

t * f;

How should this be parsed? For many languages a compiler doesn't need to know the meaning of a name in order to parse and basically know what action a line of code does. In C++, the above however can yield vastly different interpretations depending on what t means. If it's a type, then it will be a declaration of a pointer f. However if it's not a type, it will be a multiplication. So the C++ Standard says at paragraph (3/7):

Some names denote types or templates. In general, whenever a name is encountered it is necessary to determine whether that name denotes one of these entities before continuing to parse the program that contains it. The process that determines this is called name lookup.

How will the compiler find out what a name t::x refers to, if t refers to a template type parameter? x could be a static int data member that could be multiplied or could equally well be a nested class or typedef that could yield to a declaration. If a name has this property - that it can't be looked up until the actual template arguments are known - then it's called a dependent name (it "depends" on the template parameters).

You might recommend to just wait till the user instantiates the template:

Let's wait until the user instantiates the template, and then later find out the real meaning of t::x * f;.

This will work and actually is allowed by the Standard as a possible implementation approach. These compilers basically copy the template's text into an internal buffer, and only when an instantiation is needed, they parse the template and possibly detect errors in the definition. But instead of bothering the template's users (poor colleagues!) with errors made by a template's author, other implementations choose to check templates early on and give errors in the definition as soon as possible, before an instantiation even takes place.

So there has to be a way to tell the compiler that certain names are types and that certain names aren't.

The "typename" keyword

The answer is: We decide how the compiler should parse this. If t::x is a dependent name, then we need to prefix it by typename to tell the compiler to parse it in a certain way. The Standard says at (14.6/2):

A name used in a template declaration or definition and that is dependent on a template-parameter is assumed not to name a type unless the applicable name lookup finds a type name or the name is qualified by the keyword typename.

There are many names for which typename is not necessary, because the compiler can, with the applicable name lookup in the template definition, figure out how to parse a construct itself - for example with T *f;, when T is a type template parameter. But for t::x * f; to be a declaration, it must be written as typename t::x *f;. If you omit the keyword and the name is taken to be a non-type, but when instantiation finds it denotes a type, the usual error messages are emitted by the compiler. Sometimes, the error consequently is given at definition time:

// t::x is taken as non-type, but as an expression the following misses an
// operator between the two names or a semicolon separating them.
t::x f;

The syntax allows typename only before qualified names - it is therefor taken as granted that unqualified names are always known to refer to types if they do so.

A similar gotcha exists for names that denote templates, as hinted at by the introductory text.

The "template" keyword

Remember the initial quote above and how the Standard requires special handling for templates as well? Let's take the following innocent-looking example:

boost::function< int() > f;

It might look obvious to a human reader. Not so for the compiler. Imagine the following arbitrary definition of boost::function and f:

namespace boost { int function = 0; }
int main() { 
  int f = 0;
  boost::function< int() > f; 
}

That's actually a valid expression! It uses the less-than operator to compare boost::function against zero (int()), and then uses the greater-than operator to compare the resulting bool against f. However as you might well know, boost::function in real life is a template, so the compiler knows (14.2/3):

After name lookup (3.4) finds that a name is a template-name, if this name is followed by a <, the < is always taken as the beginning of a template-argument-list and never as a name followed by the less-than operator.

Now we are back to the same problem as with typename. What if we can't know yet whether the name is a template when parsing the code? We will need to insert template immediately before the template name, as specified by 14.2/4. This looks like:

t::template f<int>(); // call a function template

Template names can not only occur after a :: but also after a -> or . in a class member access. You need to insert the keyword there too:

this->template f<int>(); // call a function template

Dependencies

For the people that have thick Standardese books on their shelf and that want to know what exactly I was talking about, I'll talk a bit about how this is specified in the Standard.

In template declarations some constructs have different meanings depending on what template arguments you use to instantiate the template: Expressions may have different types or values, variables may have different types or function calls might end up calling different functions. Such constructs are generally said to depend on template parameters.

The Standard defines precisely the rules by whether a construct is dependent or not. It separates them into logically different groups: One catches types, another catches expressions. Expressions may depend by their value and/or their type. So we have, with typical examples appended:

Dependent types (e.g: a type template parameter T)
Value-dependent expressions (e.g: a non-type template parameter N)
Type-dependent expressions (e.g: a cast to a type template parameter (T)0)

Most of the rules are intuitive and are built up recursively: For example, a type constructed as T[N] is a dependent type if N is a value-dependent expression or T is a dependent type. The details of this can be read in section (14.6.2/1) for dependent types, (14.6.2.2) for type-dependent expressions and (14.6.2.3) for value-dependent expressions.

Dependent names

The Standard is a bit unclear about what exactly is a dependent name. On a simple read (you know, the principle of least surprise), all it defines as a dependent name is the special case for function names below. But since clearly T::x also needs to be looked up in the instantiation context, it also needs to be a dependent name (fortunately, as of mid C++14 the committee has started to look into how to fix this confusing definition).

To avoid this problem, I have resorted to a simple interpretation of the Standard text. Of all the constructs that denote dependent types or expressions, a subset of them represent names. Those names are therefore "dependent names". A name can take different forms - the Standard says:

A name is a use of an identifier (2.11), operator-function-id (13.5), conversion-function-id (12.3.2), or template-id (14.2) that denotes an entity or label (6.6.4, 6.1)

An identifier is just a plain sequence of characters / digits, while the next two are the operator + and operator type form. The last form is template-name <argument list>. All these are names, and by conventional use in the Standard, a name can also include qualifiers that say what namespace or class a name should be looked up in.

A value dependent expression 1 + N is not a name, but N is. The subset of all dependent constructs that are names is called dependent name. Function names, however, may have different meaning in different instantiations of a template, but unfortunately are not caught by this general rule.

Dependent function names

Not primarily a concern of this article, but still worth mentioning: Function names are an exception that are handled separately. An identifier function name is dependent not by itself, but by the type dependent argument expressions used in a call. In the example f((T)0), f is a dependent name. In the Standard, this is specified at (14.6.2/1).

Additional notes and examples

In enough cases we need both of typename and template. Your code should look like the following

template <typename T, typename Tail>
struct UnionNode : public Tail {
    // ...
    template<typename U> struct inUnion {
        typedef typename Tail::template inUnion<U> dummy;
    };
    // ...
};

The keyword template doesn't always have to appear in the last part of a name. It can appear in the middle before a class name that's used as a scope, like in the following example

typename t::template iterator<int>::value_type v;

In some cases, the keywords are forbidden, as detailed below

On the name of a dependent base class you are not allowed to write typename. It's assumed that the name given is a class type name. This is true for both names in the base-class list and the constructor initializer list:
```
 template <typename T>
 struct derive_from_Has_type : /* typename */ SomeBase<T>::type 
 { };
```

In using-declarations it's not possible to use template after the last ::, and the C++ committee said not to work on a solution.

 template <typename T>
 struct derive_from_Has_type : SomeBase<T> {
    using SomeBase<T>::template type; // error
    using typename SomeBase<T>::type; // typename *is* allowed
 };