Go – Pointers vs. values in parameters and return values

gopointers

In Go there are various ways to return a struct value or slice thereof. For individual ones I've seen:

type MyStruct struct {
    Val int
}

func myfunc() MyStruct {
    return MyStruct{Val: 1}
}

func myfunc() *MyStruct {
    return &MyStruct{}
}

func myfunc(s *MyStruct) {
    s.Val = 1
}

I understand the differences between these. The first returns a copy of the struct, the second a pointer to the struct value created within the function, the third expects an existing struct to be passed in and overrides the value.

I've seen all of these patterns be used in various contexts, I'm wondering what the best practices are regarding these. When would you use which? For instance, the first one could be ok for small structs (because the overhead is minimal), the second for bigger ones. And the third if you want to be extremely memory efficient, because you can easily reuse a single struct instance between calls. Are there any best practices for when to use which?

Similarly, the same question regarding slices:

func myfunc() []MyStruct {
    return []MyStruct{ MyStruct{Val: 1} }
}

func myfunc() []*MyStruct {
    return []MyStruct{ &MyStruct{Val: 1} }
}

func myfunc(s *[]MyStruct) {
    *s = []MyStruct{ MyStruct{Val: 1} }
}

func myfunc(s *[]*MyStruct) {
    *s = []MyStruct{ &MyStruct{Val: 1} }
}

Again: what are best practices here. I know slices are always pointers, so returning a pointer to a slice isn't useful. However, should I return a slice of struct values, a slice of pointers to structs, should I pass in a pointer to a slice as argument (a pattern used in the Go App Engine API)?

Best Answer

tl;dr:

Methods using receiver pointers are common; the rule of thumb for receivers is, "If in doubt, use a pointer."
Slices, maps, channels, strings, function values, and interface values are implemented with pointers internally, and a pointer to them is often redundant.
Elsewhere, use pointers for big structs or structs you'll have to change, and otherwise pass values, because getting things changed by surprise via a pointer is confusing.

One case where you should often use a pointer:

Receivers are pointers more often than other arguments. It's not unusual for methods to modify the thing they're called on, or for named types to be large structs, so the guidance is to default to pointers except in rare cases.
- Jeff Hodges' copyfighter tool automatically searches for non-tiny receivers passed by value.

Some situations where you don't need pointers:

Code review guidelines suggest passing small structs like type Point struct { latitude, longitude float64 }, and maybe even things a bit bigger, as values, unless the function you're calling needs to be able to modify them in place.
- Value semantics avoid aliasing situations where an assignment over here changes a value over there by surprise.
- It's not Go-y to sacrifice clean semantics for a little speed, and sometimes passing small structs by value is actually more efficient, because it avoids cache misses or heap allocations.
- So, Go Wiki's code review comments page suggests passing by value when structs are small and likely to stay that way.
- If the "large" cutoff seems vague, it is; arguably many structs are in a range where either a pointer or a value is OK. As a lower bound, the code review comments suggest slices (three machine words) are reasonable to use as value receivers. As something nearer an upper bound, bytes.Replace takes 10 words' worth of args (three slices and an int). You can find situations where copying even large structs turns out a performance win, but the rule of thumb is not to.
For slices, you don't need to pass a pointer to change elements of the array. io.Reader.Read(p []byte) changes the bytes of p, for instance. It's arguably a special case of "treat little structs like values," since internally you're passing around a little structure called a slice header (see Russ Cox (rsc)'s explanation). Similarly, you don't need a pointer to modify a map or communicate on a channel.
For slices you'll reslice (change the start/length/capacity of), built-in functions like append accept a slice value and return a new one. I'd imitate that; it avoids aliasing, returning a new slice helps call attention to the fact that a new array might be allocated, and it's familiar to callers.
- It's not always practical follow that pattern. Some tools like database interfaces or serializers need to append to a slice whose type isn't known at compile time. They sometimes accept a pointer to a slice in an interface{} parameter.
Maps, channels, strings, and function and interface values, like slices, are internally references or structures that contain references already, so if you're just trying to avoid getting the underlying data copied, you don't need to pass pointers to them. (rsc wrote a separate post on how interface values are stored).
- You still may need to pass pointers in the rarer case that you want to modify the caller's struct: flag.StringVar takes a *string for that reason, for example.

Where you use pointers:

Consider whether your function should be a method on whichever struct you need a pointer to. People expect a lot of methods on x to modify x, so making the modified struct the receiver may help to minimize surprise. There are guidelines on when receivers should be pointers.
Functions that have effects on their non-receiver params should make that clear in the godoc, or better yet, the godoc and the name (like reader.WriteTo(writer)).
You mention accepting a pointer to avoid allocations by allowing reuse; changing APIs for the sake of memory reuse is an optimization I'd delay until it's clear the allocations have a nontrivial cost, and then I'd look for a way that doesn't force the trickier API on all users:
1. For avoiding allocations, Go's escape analysis is your friend. You can sometimes help it avoid heap allocations by making types that can be initialized with a trivial constructor, a plain literal, or a useful zero value like bytes.Buffer.
2. Consider a Reset() method to put an object back in a blank state, like some stdlib types offer. Users who don't care or can't save an allocation don't have to call it.
3. Consider writing modify-in-place methods and create-from-scratch functions as matching pairs, for convenience: existingUser.LoadFromJSON(json []byte) error could be wrapped by NewUserFromJSON(json []byte) (*User, error). Again, it pushes the choice between laziness and pinching allocations to the individual caller.
4. Callers seeking to recycle memory can let sync.Pool handle some details. If a particular allocation creates a lot of memory pressure, you're confident you know when the alloc is no longer used, and you don't have a better optimization available, sync.Pool can help. (CloudFlare published a useful (pre-sync.Pool) blog post about recycling.)

Finally, on whether your slices should be of pointers: slices of values can be useful, and save you allocations and cache misses. There can be blockers:

The API to create your items might force pointers on you, e.g. you have to call NewFoo() *Foo rather than let Go initialize with the zero value.
The desired lifetimes of the items might not all be the same. The whole slice is freed at once; if 99% of the items are no longer useful but you have pointers to the other 1%, all of the array remains allocated.
Moving values around might cause you performance or correctness problems, making pointers more attractive. Notably, append copies items when it grows the underlying array. Pointers you got before the append point to the wrong place after, copying can be slower for huge structs, and for e.g. sync.Mutex copying isn't allowed. Insert/delete in the middle and sorting similarly move items around.

Broadly, value slices can make sense if either you get all of your items in place up front and don't move them (e.g., no more appends after initial setup), or if you do keep moving them around but you're sure that's OK (no/careful use of pointers to items, items are small enough to copy efficiently, etc.). Sometimes you have to think about or measure the specifics of your situation, but that's a rough guide.

Related Solutions

C++ – What are the differences between a pointer variable and a reference variable in C++

A pointer can be re-assigned:

int x = 5;
int y = 6;
int *p;
p = &x;
p = &y;
*p = 10;
assert(x == 5);
assert(y == 10);

A reference cannot be re-bound, and must be bound at initialization:

int x = 5;
int y = 6;
int &q; // error
int &r = x;

A pointer variable has its own identity: a distinct, visible memory address that can be taken with the unary & operator and a certain amount of space that can be measured with the sizeof operator. Using those operators on a reference returns a value corresponding to whatever the reference is bound to; the reference’s own address and size are invisible. Since the reference assumes the identity of the original variable in this way, it is convenient to think of a reference as another name for the same variable.
```
int x = 0;
int &r = x;
int *p = &x;
int *p2 = &r;

assert(p == p2); // &x == &r
assert(&p != &p2);
```

You can have arbitrarily nested pointers to pointers offering extra levels of indirection. References only offer one level of indirection.

int x = 0;
int y = 0;
int *p = &x;
int *q = &y;
int **pp = &p;

**pp = 2;
pp = &q; // *pp is now q
**pp = 4;

assert(y == 4);
assert(x == 2);

A pointer can be assigned nullptr, whereas a reference must be bound to an existing object. If you try hard enough, you can bind a reference to nullptr, but this is undefined and will not behave consistently.

/* the code below is undefined; your compiler may optimise it
 * differently, emit warnings, or outright refuse to compile it */

int &r = *static_cast<int *>(nullptr);

// prints "null" under GCC 10
std::cout
    << (&r != nullptr
        ? "not null" : "null")
    << std::endl;

bool f(int &r) { return &r != nullptr; }

// prints "not null" under GCC 10
std::cout
    << (f(*static_cast<int *>(nullptr))
        ? "not null" : "null")
    << std::endl;

You can, however, have a reference to a pointer whose value is nullptr.

Pointers can iterate over an array; you can use ++ to go to the next item that a pointer is pointing to, and + 4 to go to the 5th element. This is no matter what size the object is that the pointer points to.
A pointer needs to be dereferenced with * to access the memory location it points to, whereas a reference can be used directly. A pointer to a class/struct uses -> to access its members whereas a reference uses a ..
References cannot be put into an array, whereas pointers can be (Mentioned by user @litb)
Const references can be bound to temporaries. Pointers cannot (not without some indirection):
```
const int &x = int(12); // legal C++
int *y = &int(12); // illegal to take the address of a temporary.
```
This makes const & more convenient to use in argument lists and so forth.

C++ – a smart pointer and when should I use one

UPDATE

This answer is rather old, and so describes what was 'good' at the time, which was smart pointers provided by the Boost library. Since C++11, the standard library has provided sufficient smart pointers types, and so you should favour the use of std::unique_ptr, std::shared_ptr and std::weak_ptr.

There was also std::auto_ptr. It was very much like a scoped pointer, except that it also had the "special" dangerous ability to be copied — which also unexpectedly transfers ownership.
It was deprecated in C++11 and removed in C++17, so you shouldn't use it.

std::auto_ptr<MyObject> p1 (new MyObject());
std::auto_ptr<MyObject> p2 = p1; // Copy and transfer ownership. 
                                 // p1 gets set to empty!
p2->DoSomething(); // Works.
p1->DoSomething(); // Oh oh. Hopefully raises some NULL pointer exception.

OLD ANSWER

A smart pointer is a class that wraps a 'raw' (or 'bare') C++ pointer, to manage the lifetime of the object being pointed to. There is no single smart pointer type, but all of them try to abstract a raw pointer in a practical way.

Smart pointers should be preferred over raw pointers. If you feel you need to use pointers (first consider if you really do), you would normally want to use a smart pointer as this can alleviate many of the problems with raw pointers, mainly forgetting to delete the object and leaking memory.

With raw pointers, the programmer has to explicitly destroy the object when it is no longer useful.

// Need to create the object to achieve some goal
MyObject* ptr = new MyObject(); 
ptr->DoSomething(); // Use the object in some way
delete ptr; // Destroy the object. Done with it.
// Wait, what if DoSomething() raises an exception...?

A smart pointer by comparison defines a policy as to when the object is destroyed. You still have to create the object, but you no longer have to worry about destroying it.

SomeSmartPtr<MyObject> ptr(new MyObject());
ptr->DoSomething(); // Use the object in some way.

// Destruction of the object happens, depending 
// on the policy the smart pointer class uses.

// Destruction would happen even if DoSomething() 
// raises an exception

The simplest policy in use involves the scope of the smart pointer wrapper object, such as implemented by boost::scoped_ptr or std::unique_ptr.

void f()
{
    {
       std::unique_ptr<MyObject> ptr(new MyObject());
       ptr->DoSomethingUseful();
    } // ptr goes out of scope -- 
      // the MyObject is automatically destroyed.

    // ptr->Oops(); // Compile error: "ptr" not defined
                    // since it is no longer in scope.
}

Note that std::unique_ptr instances cannot be copied. This prevents the pointer from being deleted multiple times (incorrectly). You can, however, pass references to it around to other functions you call.

std::unique_ptrs are useful when you want to tie the lifetime of the object to a particular block of code, or if you embedded it as member data inside another object, the lifetime of that other object. The object exists until the containing block of code is exited, or until the containing object is itself destroyed.

A more complex smart pointer policy involves reference counting the pointer. This does allow the pointer to be copied. When the last "reference" to the object is destroyed, the object is deleted. This policy is implemented by boost::shared_ptr and std::shared_ptr.

void f()
{
    typedef std::shared_ptr<MyObject> MyObjectPtr; // nice short alias
    MyObjectPtr p1; // Empty

    {
        MyObjectPtr p2(new MyObject());
        // There is now one "reference" to the created object
        p1 = p2; // Copy the pointer.
        // There are now two references to the object.
    } // p2 is destroyed, leaving one reference to the object.
} // p1 is destroyed, leaving a reference count of zero. 
  // The object is deleted.

Reference counted pointers are very useful when the lifetime of your object is much more complicated, and is not tied directly to a particular section of code or to another object.

There is one drawback to reference counted pointers — the possibility of creating a dangling reference:

// Create the smart pointer on the heap
MyObjectPtr* pp = new MyObjectPtr(new MyObject())
// Hmm, we forgot to destroy the smart pointer,
// because of that, the object is never destroyed!

Another possibility is creating circular references:

struct Owner {
   std::shared_ptr<Owner> other;
};

std::shared_ptr<Owner> p1 (new Owner());
std::shared_ptr<Owner> p2 (new Owner());
p1->other = p2; // p1 references p2
p2->other = p1; // p2 references p1

// Oops, the reference count of of p1 and p2 never goes to zero!
// The objects are never destroyed!

To work around this problem, both Boost and C++11 have defined a weak_ptr to define a weak (uncounted) reference to a shared_ptr.

Best Answer

Related Solutions

C++ – What are the differences between a pointer variable and a reference variable in C++

C++ – a smart pointer and when should I use one

Related Topic