C++ – Operator Precedence vs Order of Evaluation

coperator precedence

The terms 'operator precedence' and 'order of evaluation' are very commonly used terms in programming and extremely important for a programmer to know. And, as far as I understand them, the two concepts are tightly bound; one cannot do without the other when talking about expressions.

Let us take a simple example:

int a=1;  // Line 1
a = a++ + ++a;  // Line 2
printf("%d",a);  // Line 3

Now, it is evident that Line 2 leads to Undefined Behavior, since Sequence points in C and C++ include:

Between evaluation of the left and right operands of the && (logical
AND), || (logical OR), and comma
operators. For example, in the
expression *p++ != 0 && *q++ != 0, all
side effects of the sub-expression
*p++ != 0 are completed before any attempt to access q.

Between the evaluation of the first operand of the ternary
"question-mark" operator and the
second or third operand. For example,
in the expression a = (*p++) ? (*p++) : 0 there is a sequence point after
the first *p++, meaning it has already
been incremented by the time the
second instance is executed.

At the end of a full expression. This category includes expression
statements (such as the assignment
a=b;), return statements, the
controlling expressions of if, switch,
while, or do-while statements, and all
three expressions in a for statement.

Before a function is entered in a function call. The order in which
the arguments are evaluated is not
specified, but this sequence point
means that all of their side effects
are complete before the function is
entered. In the expression f(i++) + g(j++) + h(k++),
f is called with a
parameter of the original value of i,
but i is incremented before entering
the body of f. Similarly, j and k are
updated before entering g and h
respectively. However, it is not
specified in which order f(), g(), h()
are executed, nor in which order i, j,
k are incremented. The values of j and
k in the body of f are therefore
undefined.3 Note that a function
call f(a,b,c) is not a use of the
comma operator and the order of
evaluation for a, b, and c is
unspecified.

At a function return, after the return value is copied into the
calling context. (This sequence point
is only specified in the C++ standard;
it is present only implicitly in
C.)

At the end of an initializer; for example, after the evaluation of 5
in the declaration int a = 5;.

Thus, going by Point # 3:

At the end of a full expression. This category includes expression statements (such as the assignment a=b;), return statements, the controlling expressions of if, switch, while, or do-while statements, and all three expressions in a for statement.

Line 2 clearly leads to Undefined Behavior. This shows how Undefined Behaviour is tightly coupled with Sequence Points.

Now let us take another example:

int x=10,y=1,z=2; // Line 4
int result = x<y<z; // Line 5

Now its evident that Line 5 will make the variable result store 1.

Now the expression x<y<z in Line 5 can be evaluated as either:

x<(y<z) or (x<y)<z. In the first case the value of result will be 0 and in the second case result will be 1. But we know, when the Operator Precedence is Equal/Same – Associativity comes into play, hence, is evaluated as (x<y)<z.

This is what is said in this MSDN Article:

The precedence and associativity of C operators affect the grouping and evaluation of operands in expressions. An operator's precedence is meaningful only if other operators with higher or lower precedence are present. Expressions with higher-precedence operators are evaluated first. Precedence can also be described by the word "binding." Operators with a higher precedence are said to have tighter binding.

Now, about the above article:

It mentions "Expressions with higher-precedence operators are evaluated first."

It may sound incorrect. But, I think the article is not saying something wrong if we consider that () is also an operator x<y<z is same as (x<y)<z. My reasoning is if associativity does not come into play, then the complete expressions evaluation would become ambiguous since < is not a Sequence Point.

Also, another link I found says this on Operator Precedence and Associativity:

This page lists C operators in order of precedence (highest to lowest). Their associativity indicates in what order operators of equal precedence in an expression are applied.

So taking, the second example of int result=x<y<z, we can see here that there are in all 3 expressions, x, y and z, since, the simplest form of an expression consists of a single literal constant or object. Hence the result of the expressions x, y, z would be there rvalues, i.e., 10, 1 and 2 respectively. Hence, now we may interpret x<y<z as 10<1<2.

Now, doesn't Associativity come into play since now we have 2 expressions to be evaluated, either 10<1 or 1<2 and since the precedence of operator is same, they are evaluated from left to right?

Taking this last example as my argument:

int myval = ( printf("Operator\n"), printf("Precedence\n"), printf("vs\n"),
printf("Order of Evaluation\n") );

Now in the above example, since the comma operator has same precedence, the expressions are evaluated left-to-right and the return value of the last printf() is stored in myval.

In SO/IEC 9899:201x under J.1 Unspecified behavior it mentions:

The order in which subexpressions are evaluated and the order in which side effects
take place, except as specified for the function-call (), &&, ||, ?:, and comma
operators (6.5).

Now I would like to know, would it be wrong to say:

Order of Evaluation depends on the precedence of operators, leaving cases of Unspecified Behavior.

I would like to be corrected if any mistakes were made in something I said in my question.
The reason I posted this question is because of the confusion created in my mind by the MSDN Article. Is it in Error or not?

Best Answer

Yes, the MSDN article is in error, at least with respect to standard C and C++¹.

Having said that, let me start with a note about terminology: in the C++ standard, they (mostly--there are a few slip-ups) use "evaluation" to refer to evaluating an operand, and "value computation" to refer to carrying out an operation. So, when (for example) you do a + b, each of a and b is evaluated, then the value computation is carried out to determine the result.

It's clear that the order of value computations is (mostly) controlled by precedence and associativity--controlling value computations is basically the definition of what precedence and associativity are. The remainder of this answer uses "evaluation" to refer to evaluation of operands, not to value computations.

Now, as to evaluation order being determined by precedence, no it's not! It's as simple as that. Just for example, let's consider your example of x<y<z. According to the associativity rules, this parses as (x<y)<z. Now, consider evaluating this expression on a stack machine. It's perfectly allowable for it to do something like this:

 push(z);    // Evaluates its argument and pushes value on stack
 push(y);
 push(x);
 test_less();  // compares TOS to TOS(1), pushes result on stack
 test_less();

This evaluates z before x or y, but still evaluates (x<y), then compares the result of that comparison to z, just as it's supposed to.

Summary: Order of evaluation is independent of associativity.

Precedence is the same way. We can change the expression to x*y+z, and still evaluate z before x or y:

push(z);
push(y);
push(x);
mul();
add();

Summary: Order of evaluation is independent of precedence.

When/if we add in side effects, this remains the same. I think it's educational to think of side effects as being carried out by a separate thread of execution, with a join at the next sequence point (e.g., the end of the expression). So something like a=b++ + ++c; could be executed something like this:

push(a);
push(b);
push(c+1);
side_effects_thread.queue(inc, b);
side_effects_thread.queue(inc, c);
add();
assign();
join(side_effects_thread);

This also shows why an apparent dependency doesn't necessarily affect order of evaluation either. Even though a is the target of the assignment, this still evaluates a before evaluating either b or c. Also note that although I've written it as "thread" above, this could also just as well be a pool of threads, all executing in parallel, so you don't get any guarantee about the order of one increment versus another either.

Unless the hardware had direct (and cheap) support for thread-safe queuing, this probably wouldn't be used in in a real implementation (and even then it's not very likely). Putting something into a thread-safe queue will normally have quite a bit more overhead than doing a single increment, so it's hard to imagine anybody ever doing this in reality. Conceptually, however, the idea is fits the requirements of the standard: when you use a pre/post increment/decrement operation, you're specifying an operation that will happen sometime after that part of the expression is evaluated, and will be complete at the next sequence point.

Edit: though it's not exactly threading, some architectures do allow such parallel execution. For a couple of examples, the Intel Itanium and VLIW processors such as some DSPs, allow a compiler to designate a number of instructions to be executed in parallel. Most VLIW machines have a specific instruction "packet" size that limits the number of instructions executed in parallel. The Itanium also uses packets of instructions, but designates a bit in an instruction packet to say that the instructions in the current packet can be executed in parallel with those in the next packet. Using mechanisms like this, you get instructions executing in parallel, just like if you used multiple threads on architectures with which most of us are more familiar.

Summary: Order of evaluation is independent of apparent dependencies

Any attempt at using the value before the next sequence point gives undefined behavior -- in particular, the "other thread" is (potentially) modifying that data during that time, and you have no way of synchronizing access with the other thread. Any attempt at using it leads to undefined behavior.

Just for a (admittedly, now rather far-fetched) example, think of your code running on a 64-bit virtual machine, but the real hardware is an 8-bit processor. When you increment a 64-bit variable, it executes a sequence something like:

load variable[0]
increment
store variable[0]
for (int i=1; i<8; i++) {
    load variable[i]
    add_with_carry 0
    store variable[i]
}

If you read the value somewhere in the middle of that sequence, you could get something with only some of the bytes modified, so what you get is neither the old value nor the new one.

This exact example may be pretty far-fetched, but a less extreme version (e.g., a 64-bit variable on a 32-bit machine) is actually fairly common.

Conclusion

Order of evaluation does not depend on precedence, associativity, or (necessarily) on apparent dependencies. Attempting to use a variable to which a pre/post increment/decrement has been applied in any other part of an expression really does give completely undefined behavior. While an actual crash is unlikely, you're definitely not guaranteed to get either the old value or the new one -- you could get something else entirely.

¹ I haven't checked this particular article, but quite a few MSDN articles talk about Microsoft's Managed C++ and/or C++/CLI (or are specific to their implementation of C++) but do little or nothing to point out that they don't apply to standard C or C++. This can give the false appearance that they're claiming the rules they have decided to apply to their own languages actually apply to the standard languages. In these cases, the articles aren't technically false -- they just don't have anything to do with standard C or C++. If you attempt to apply those statements to standard C or C++, the result is false.

Common operators to overload

Most of the work in overloading operators is boiler-plate code. That is little wonder, since operators are merely syntactic sugar, their actual work could be done by (and often is forwarded to) plain functions. But it is important that you get this boiler-plate code right. If you fail, either your operator’s code won’t compile or your users’ code won’t compile or your users’ code will behave surprisingly.

Assignment Operator

There's a lot to be said about assignment. However, most of it has already been said in GMan's famous Copy-And-Swap FAQ, so I'll skip most of it here, only listing the perfect assignment operator for reference:

X& X::operator=(X rhs)
{
  swap(rhs);
  return *this;
}

Bitshift Operators (used for Stream I/O)

The bitshift operators << and >>, although still used in hardware interfacing for the bit-manipulation functions they inherit from C, have become more prevalent as overloaded stream input and output operators in most applications. For guidance overloading as bit-manipulation operators, see the section below on Binary Arithmetic Operators. For implementing your own custom format and parsing logic when your object is used with iostreams, continue.

The stream operators, among the most commonly overloaded operators, are binary infix operators for which the syntax specifies no restriction on whether they should be members or non-members. Since they change their left argument (they alter the stream’s state), they should, according to the rules of thumb, be implemented as members of their left operand’s type. However, their left operands are streams from the standard library, and while most of the stream output and input operators defined by the standard library are indeed defined as members of the stream classes, when you implement output and input operations for your own types, you cannot change the standard library’s stream types. That’s why you need to implement these operators for your own types as non-member functions. The canonical forms of the two are these:

std::ostream& operator<<(std::ostream& os, const T& obj)
{
  // write obj to stream

  return os;
}

std::istream& operator>>(std::istream& is, T& obj)
{
  // read obj from stream

  if( /* no valid object of T found in stream */ )
    is.setstate(std::ios::failbit);

  return is;
}

When implementing operator>>, manually setting the stream’s state is only necessary when the reading itself succeeded, but the result is not what would be expected.

Function call operator

The function call operator, used to create function objects, also known as functors, must be defined as a member function, so it always has the implicit this argument of member functions. Other than this, it can be overloaded to take any number of additional arguments, including zero.

Here's an example of the syntax:

class foo {
public:
    // Overloaded call operator
    int operator()(const std::string& y) {
        // ...
    }
};

Usage:

foo f;
int a = f("hello");

Throughout the C++ standard library, function objects are always copied. Your own function objects should therefore be cheap to copy. If a function object absolutely needs to use data which is expensive to copy, it is better to store that data elsewhere and have the function object refer to it.

Comparison operators

The binary infix comparison operators should, according to the rules of thumb, be implemented as non-member functions¹. The unary prefix negation ! should (according to the same rules) be implemented as a member function. (but it is usually not a good idea to overload it.)

The standard library’s algorithms (e.g. std::sort()) and types (e.g. std::map) will always only expect operator< to be present. However, the users of your type will expect all the other operators to be present, too, so if you define operator<, be sure to follow the third fundamental rule of operator overloading and also define all the other boolean comparison operators. The canonical way to implement them is this:

inline bool operator==(const X& lhs, const X& rhs){ /* do actual comparison */ }
inline bool operator!=(const X& lhs, const X& rhs){return !operator==(lhs,rhs);}
inline bool operator< (const X& lhs, const X& rhs){ /* do actual comparison */ }
inline bool operator> (const X& lhs, const X& rhs){return  operator< (rhs,lhs);}
inline bool operator<=(const X& lhs, const X& rhs){return !operator> (lhs,rhs);}
inline bool operator>=(const X& lhs, const X& rhs){return !operator< (lhs,rhs);}

The important thing to note here is that only two of these operators actually do anything, the others are just forwarding their arguments to either of these two to do the actual work.

The syntax for overloading the remaining binary boolean operators (||, &&) follows the rules of the comparison operators. However, it is very unlikely that you would find a reasonable use case for these².

¹ _{As with all rules of thumb, sometimes there might be reasons to break this one, too. If so, do not forget that the left-hand operand of the binary comparison operators, which for member functions will be *this, needs to be const, too. So a comparison operator implemented as a member function would have to have this signature:}

bool operator<(const X& rhs) const { /* do actual comparison with *this */ }

_{(Note the const at the end.)}

² _{It should be noted that the built-in version of || and && use shortcut semantics. While the user defined ones (because they are syntactic sugar for method calls) do not use shortcut semantics. User will expect these operators to have shortcut semantics, and their code may depend on it, Therefore it is highly advised NEVER to define them.}

Arithmetic Operators

Unary arithmetic operators

The unary increment and decrement operators come in both prefix and postfix flavor. To tell one from the other, the postfix variants take an additional dummy int argument. If you overload increment or decrement, be sure to always implement both prefix and postfix versions. Here is the canonical implementation of increment, decrement follows the same rules:

class X {
  X& operator++()
  {
    // do actual increment
    return *this;
  }
  X operator++(int)
  {
    X tmp(*this);
    operator++();
    return tmp;
  }
};

Note that the postfix variant is implemented in terms of prefix. Also note that postfix does an extra copy.²

Overloading unary minus and plus is not very common and probably best avoided. If needed, they should probably be overloaded as member functions.

² _{Also note that the postfix variant does more work and is therefore less efficient to use than the prefix variant. This is a good reason to generally prefer prefix increment over postfix increment. While compilers can usually optimize away the additional work of postfix increment for built-in types, they might not be able to do the same for user-defined types (which could be something as innocently looking as a list iterator). Once you got used to do i++, it becomes very hard to remember to do ++i instead when i is not of a built-in type (plus you'd have to change code when changing a type), so it is better to make a habit of always using prefix increment, unless postfix is explicitly needed.}

Binary arithmetic operators

For the binary arithmetic operators, do not forget to obey the third basic rule operator overloading: If you provide +, also provide +=, if you provide -, do not omit -=, etc. Andrew Koenig is said to have been the first to observe that the compound assignment operators can be used as a base for their non-compound counterparts. That is, operator + is implemented in terms of +=, - is implemented in terms of -= etc.

According to our rules of thumb, + and its companions should be non-members, while their compound assignment counterparts (+= etc.), changing their left argument, should be a member. Here is the exemplary code for += and +; the other binary arithmetic operators should be implemented in the same way:

class X {
  X& operator+=(const X& rhs)
  {
    // actual addition of rhs to *this
    return *this;
  }
};
inline X operator+(X lhs, const X& rhs)
{
  lhs += rhs;
  return lhs;
}

operator+= returns its result per reference, while operator+ returns a copy of its result. Of course, returning a reference is usually more efficient than returning a copy, but in the case of operator+, there is no way around the copying. When you write a + b, you expect the result to be a new value, which is why operator+ has to return a new value.³ Also note that operator+ takes its left operand by copy rather than by const reference. The reason for this is the same as the reason giving for operator= taking its argument per copy.

The bit manipulation operators ~ & | ^ << >> should be implemented in the same way as the arithmetic operators. However, (except for overloading << and >> for output and input) there are very few reasonable use cases for overloading these.

³ _{Again, the lesson to be taken from this is that a += b is, in general, more efficient than a + b and should be preferred if possible.}

Array Subscripting

The array subscript operator is a binary operator which must be implemented as a class member. It is used for container-like types that allow access to their data elements by a key. The canonical form of providing these is this:

class X {
        value_type& operator[](index_type idx);
  const value_type& operator[](index_type idx) const;
  // ...
};

Unless you do not want users of your class to be able to change data elements returned by operator[] (in which case you can omit the non-const variant), you should always provide both variants of the operator.

If value_type is known to refer to a built-in type, the const variant of the operator should better return a copy instead of a const reference:

class X {
  value_type& operator[](index_type idx);
  value_type  operator[](index_type idx) const;
  // ...
};

Operators for Pointer-like Types

For defining your own iterators or smart pointers, you have to overload the unary prefix dereference operator * and the binary infix pointer member access operator ->:

class my_ptr {
        value_type& operator*();
  const value_type& operator*() const;
        value_type* operator->();
  const value_type* operator->() const;
};

Note that these, too, will almost always need both a const and a non-const version. For the -> operator, if value_type is of class (or struct or union) type, another operator->() is called recursively, until an operator->() returns a value of non-class type.

The unary address-of operator should never be overloaded.

For operator->*() see this question. It's rarely used and thus rarely ever overloaded. In fact, even iterators do not overload it.

Continue to Conversion Operators