C++ – c/c++ left shift unsigned vs signed

bit-shiftc

I have this code.

#include <iostream>

int main()
{
    unsigned long int i = 1U << 31;
    std::cout << i << std::endl;
    unsigned long int uwantsum = 1 << 31;
    std::cout << uwantsum << std::endl;
    return 0;
}

It prints out.

2147483648
18446744071562067968

on Arch Linux 64 bit, gcc, ivy bridge architecture.

The first result makes sense, but I don't understand where the second number came from. 1 represented as a 4byte int signed or unsigned is

00000000000000000000000000000001

When you shift it 31 times to the left, you end up with

10000000000000000000000000000000

no? I know shifting left for positive numbers is essentially 2^k where k is how many times you shift it, assuming it still fits within bounds. Why is it I get such a bizarre number?

Best Answer

Presumably you're interested in why this: unsigned long int uwantsum = 1 << 31; produces a "strange" value.

The problem is pretty simple: 1 is a plain int, so the shift is done on a plain int, and only after it's complete is the result converted to unsigned long.

In this case, however, 1<<31 overflows the range of a 32-bit signed int, so the result is undefined¹. After conversion to unsigned, the result remains undefined.

That said, in most typical cases, what's likely to happen is that 1<<31 will give a bit pattern of 10000000000000000000000000000000. When viewed as a signed 2's complement² number, this is -2147483648. Since that's negative, when it's converted to a 64-bit type, it'll be sign extended, so the top 32 bits will be filled with copies of what's in bit 31. That gives: 1111111111111111111111111111111110000000000000000000000000000000 (33 1-bits followed by 31 0-bits).

If we then treat that as an unsigned 64-bit number, we get 18446744071562067968.

§5.8/2: ^{The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 × 2E2, reduced modulo one more than the maximum value representable in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1×2E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; otherwise, the behavior is undefined.}
In theory, the computer could use 1's complement or signed magnitude for signed numbers--but 2's complement is currently much more common than either of those. If it did use one of those, we'd expect a different final result.

The Operators

>> is the arithmetic (or signed) right shift operator.
>>> is the logical (or unsigned) right shift operator.
<< is the left shift operator, and meets the needs of both logical and arithmetic shifts.

All of these operators can be applied to integer values (int, long, possibly short and byte or char). In some languages, applying the shift operators to any datatype smaller than int automatically resizes the operand to be an int.

Note that <<< is not an operator, because it would be redundant.

Also note that C and C++ do not distinguish between the right shift operators. They provide only the >> operator, and the right-shifting behavior is implementation defined for signed types. The rest of the answer uses the C# / Java operators.

(In all mainstream C and C++ implementations including GCC and Clang/LLVM, >> on signed types is arithmetic. Some code assumes this, but it isn't something the standard guarantees. It's not undefined, though; the standard requires implementations to define it one way or another. However, left shifts of negative signed numbers is undefined behaviour (signed integer overflow). So unless you need arithmetic right shift, it's usually a good idea to do your bit-shifting with unsigned types.)

Left shift (<<)

Integers are stored, in memory, as a series of bits. For example, the number 6 stored as a 32-bit int would be:

00000000 00000000 00000000 00000110

Shifting this bit pattern to the left one position (6 << 1) would result in the number 12:

00000000 00000000 00000000 00001100

As you can see, the digits have shifted to the left by one position, and the last digit on the right is filled with a zero. You might also note that shifting left is equivalent to multiplication by powers of 2. So 6 << 1 is equivalent to 6 * 2, and 6 << 3 is equivalent to 6 * 8. A good optimizing compiler will replace multiplications with shifts when possible.

Non-circular shifting

Please note that these are not circular shifts. Shifting this value to the left by one position (3,758,096,384 << 1):

11100000 00000000 00000000 00000000

results in 3,221,225,472:

11000000 00000000 00000000 00000000

The digit that gets shifted "off the end" is lost. It does not wrap around.

Logical right shift (>>>)

A logical right shift is the converse to the left shift. Rather than moving bits to the left, they simply move to the right. For example, shifting the number 12:

00000000 00000000 00000000 00001100

to the right by one position (12 >>> 1) will get back our original 6:

00000000 00000000 00000000 00000110

So we see that shifting to the right is equivalent to division by powers of 2.

Lost bits are gone

However, a shift cannot reclaim "lost" bits. For example, if we shift this pattern:

00111000 00000000 00000000 00000110

to the left 4 positions (939,524,102 << 4), we get 2,147,483,744:

10000000 00000000 00000000 01100000

and then shifting back ((939,524,102 << 4) >>> 4) we get 134,217,734:

00001000 00000000 00000000 00000110

We cannot get back our original value once we have lost bits.

Arithmetic right shift (>>)

The arithmetic right shift is exactly like the logical right shift, except instead of padding with zero, it pads with the most significant bit. This is because the most significant bit is the sign bit, or the bit that distinguishes positive and negative numbers. By padding with the most significant bit, the arithmetic right shift is sign-preserving.

For example, if we interpret this bit pattern as a negative number:

10000000 00000000 00000000 01100000

we have the number -2,147,483,552. Shifting this to the right 4 positions with the arithmetic shift (-2,147,483,552 >> 4) would give us:

11111000 00000000 00000000 00000110

or the number -134,217,722.

So we see that we have preserved the sign of our negative numbers by using the arithmetic right shift, rather than the logical right shift. And once again, we see that we are performing division by powers of 2.

C++ – How to detect unsigned integer multiply overflow

I see you're using unsigned integers. By definition, in C (I don't know about C++), unsigned arithmetic does not overflow ... so, at least for C, your point is moot :)

With signed integers, once there has been overflow, undefined behaviour (UB) has occurred and your program can do anything (for example: render tests inconclusive).

#include <limits.h>

int a = <something>;
int x = <something>;
a += x;              /* UB */
if (a < 0) {         /* Unreliable test */
  /* ... */
}

To create a conforming program, you need to test for overflow before generating said overflow. The method can be used with unsigned integers too:

// For addition
#include <limits.h>

int a = <something>;
int x = <something>;
if ((x > 0) && (a > INT_MAX - x)) /* `a + x` would overflow */;
if ((x < 0) && (a < INT_MIN - x)) /* `a + x` would underflow */;

// For subtraction
#include <limits.h>
int a = <something>;
int x = <something>;
if ((x < 0) && (a > INT_MAX + x)) /* `a - x` would overflow */;
if ((x > 0) && (a < INT_MIN + x)) /* `a - x` would underflow */;

// For multiplication
#include <limits.h>

int a = <something>;
int x = <something>;
// There may be a need to check for -1 for two's complement machines.
// If one number is -1 and another is INT_MIN, multiplying them we get abs(INT_MIN) which is 1 higher than INT_MAX
if ((a == -1) && (x == INT_MIN)) /* `a * x` can overflow */
if ((x == -1) && (a == INT_MIN)) /* `a * x` (or `a / x`) can overflow */
// general case
if (a > INT_MAX / x) /* `a * x` would overflow */;
if ((a < INT_MIN / x)) /* `a * x` would underflow */;

For division (except for the INT_MIN and -1 special case), there isn't any possibility of going over INT_MIN or INT_MAX.