C++ Functions – When to Use Pointers for Collections

ccoding-stylefunctionspointers

In C++ I frequently see these two signatures used seemingly interchangeably:

void fill_array(Array<Type>* array_to_fill);
Array<Type>* filled_array();

I imagine there is a subtle difference, but I don't know what it is.
Could someone explain when I might prefer one form over the other?

Best Answer

The first kind of signature is usually preferable.

The difference is that the second signature requires the array to be created inside the function. In particular, the second signature effectively requires the array to outlive the scope in which it was created. So what we're really comparing are these two snippets:

function foo1() {
    Array<Type>* array = /* allocate memory and call constructors */
    fill_array(array);
    do_stuff_with_array(array);
    /* free memory and call destructors */
}

function foo2() {
    Array<Type>* array = filled_array(); /* allocation/constructors happen elsewhere */
    do_stuff_with_array(array);
    /* free memory and call destructors */
}

The second version is potentially problematic for a few reasons:

  1. It's error prone. Functions which return pointers or references to something they created are very easy to get wrong, either in the form of undefined behavior or in the form of a completely unnecessary performance loss. Since you're working with raw pointers, it's easy to invoke undefined behavior by returning a pointer to a local variable that's no longer valid after the function has returned. If the array was being passed around as a regular object or a reference instead, you might suffer an expensive copy when filled_array() returns (the details of when this may or may not happen are complicated, see StackOverflow for all the gory details).

  2. You don't know how filled_array() allocated the memory for the array, so in principle you don't know how to deallocate it correctly. You may be able to get away with assuming it was allocated "normally", but if you don't control the allocation yourself, you just don't know for sure. It's possible some custom allocator was used, and it's also possible the pointer was saved somewhere so that a totally different function can do the deallocation at a specific time later in the pipeline (I believe this is common in C libraries). While a function that accepts a pointer as an argument could theoretically do this, it's far less likely.

  3. Memory/object reuse. What if I already have memory allocated for an Array, or an actual Array, when it comes time to call filled_array()? Unfortunately, filled_array() controls both the memory allocation and the value generation logic, so it's going to allocate more memory whether or not you need it. If you have many functions like this in a row, you're potentially wasting a huge amount of time and memory on allocations that could be completely skipped if you instead accepted pointers or references to memory controlled by client code. Or more concisely: Avoid writing functions that decide how memory should be managed and do something else with that same memory. Single responsibility principle and all that.

Of course, you should be passing the Array around by reference rather than pointer. And you should be using RAII objects (whether that means "just an Array" or a smart pointer to an Array) as much as possible so that all the allocation and deallocation is managed for you. But these arguments for creating the object at the correct scope still apply, since switching to references and RAII objects alone may only change correctness bugs into performance "bugs" (some of which move semantics can't automagically fix).