How do you set, clear, and toggle a bit?
C++ – How to set, clear, and toggle a single bit
bit-manipulationbitwise-operatorsc
Related Solutions
This is known as the 'Hamming Weight', 'popcount' or 'sideways addition'.
Some CPUs have a single built-in instruction to do it and others have parallel instructions which act on bit vectors. Instructions like x86's popcnt
(on CPUs where it's supported) will almost certainly be fastest for a single integer. Some other architectures may have a slow instruction implemented with a microcoded loop that tests a bit per cycle (citation needed - hardware popcount is normally fast if it exists at all.).
The 'best' algorithm really depends on which CPU you are on and what your usage pattern is.
Your compiler may know how to do something that's good for the specific CPU you're compiling for, e.g. C++20 std::popcount()
, or C++ std::bitset<32>::count()
, as a portable way to access builtin / intrinsic functions (see another answer on this question). But your compiler's choice of fallback for target CPUs that don't have hardware popcnt might not be optimal for your use-case. Or your language (e.g. C) might not expose any portable function that could use a CPU-specific popcount when there is one.
Portable algorithms that don't need (or benefit from) any HW support
A pre-populated table lookup method can be very fast if your CPU has a large cache and you are doing lots of these operations in a tight loop. However it can suffer because of the expense of a 'cache miss', where the CPU has to fetch some of the table from main memory. (Look up each byte separately to keep the table small.) If you want popcount for a contiguous range of numbers, only the low byte is changing for groups of 256 numbers, making this very good.
If you know that your bytes will be mostly 0's or mostly 1's then there are efficient algorithms for these scenarios, e.g. clearing the lowest set with a bithack in a loop until it becomes zero.
I believe a very good general purpose algorithm is the following, known as 'parallel' or 'variable-precision SWAR algorithm'. I have expressed this in a C-like pseudo language, you may need to adjust it to work for a particular language (e.g. using uint32_t for C++ and >>> in Java):
GCC10 and clang 10.0 can recognize this pattern / idiom and compile it to a hardware popcnt or equivalent instruction when available, giving you the best of both worlds. (https://godbolt.org/z/qGdh1dvKK)
int numberOfSetBits(uint32_t i)
{
// Java: use int, and use >>> instead of >>. Or use Integer.bitCount()
// C or C++: use uint32_t
i = i - ((i >> 1) & 0x55555555); // add pairs of bits
i = (i & 0x33333333) + ((i >> 2) & 0x33333333); // quads
i = (i + (i >> 4)) & 0x0F0F0F0F; // groups of 8
return (i * 0x01010101) >> 24; // horizontal sum of bytes
}
For JavaScript: coerce to integer with |0
for performance: change the first line to i = (i|0) - ((i >> 1) & 0x55555555);
This has the best worst-case behaviour of any of the algorithms discussed, so will efficiently deal with any usage pattern or values you throw at it. (Its performance is not data-dependent on normal CPUs where all integer operations including multiply are constant-time. It doesn't get any faster with "simple" inputs, but it's still pretty decent.)
References:
- https://graphics.stanford.edu/~seander/bithacks.html
- https://en.wikipedia.org/wiki/Hamming_weight
- http://gurmeet.net/puzzles/fast-bit-counting-routines/
- http://aggregate.ee.engr.uky.edu/MAGIC/#Population%20Count%20(Ones%20Count)
How this SWAR bithack works:
i = i - ((i >> 1) & 0x55555555);
The first step is an optimized version of masking to isolate the odd / even bits, shifting to line them up, and adding. This effectively does 16 separate additions in 2-bit accumulators (SWAR = SIMD Within A Register). Like (i & 0x55555555) + ((i>>1) & 0x55555555)
.
The next step takes the odd/even eight of those 16x 2-bit accumulators and adds again, producing 8x 4-bit sums. The i - ...
optimization isn't possible this time so it does just mask before / after shifting. Using the same 0x33...
constant both times instead of 0xccc...
before shifting is a good thing when compiling for ISAs that need to construct 32-bit constants in registers separately.
The final shift-and-add step of (i + (i >> 4)) & 0x0F0F0F0F
widens to 4x 8-bit accumulators. It masks after adding instead of before, because the maximum value in any 4-bit accumulator is 4
, if all 4 bits of the corresponding input bits were set. 4+4 = 8 which still fits in 4 bits, so carry between nibble elements is impossible in i + (i >> 4)
.
So far this is just fairly normal SIMD using SWAR techniques with a few clever optimizations. Continuing on with the same pattern for 2 more steps can widen to 2x 16-bit then 1x 32-bit counts. But there is a more efficient way on machines with fast hardware multiply:
Once we have few enough "elements", a multiply with a magic constant can sum all the elements into the top element. In this case byte elements. Multiply is done by left-shifting and adding, so a multiply of x * 0x01010101
results in x + (x<<8) + (x<<16) + (x<<24)
. Our 8-bit elements are wide enough (and holding small enough counts) that this doesn't produce carry into that top 8 bits.
A 64-bit version of this can do 8x 8-bit elements in a 64-bit integer with a 0x0101010101010101 multiplier, and extract the high byte with >>56
. So it doesn't take any extra steps, just wider constants. This is what GCC uses for __builtin_popcountll
on x86 systems when the hardware popcnt
instruction isn't enabled. If you can use builtins or intrinsics for this, do so to give the compiler a chance to do target-specific optimizations.
With full SIMD for wider vectors (e.g. counting a whole array)
This bitwise-SWAR algorithm could parallelize to be done in multiple vector elements at once, instead of in a single integer register, for a speedup on CPUs with SIMD but no usable popcount instruction. (e.g. x86-64 code that has to run on any CPU, not just Nehalem or later.)
However, the best way to use vector instructions for popcount is usually by using a variable-shuffle to do a table-lookup for 4 bits at a time of each byte in parallel. (The 4 bits index a 16 entry table held in a vector register).
On Intel CPUs, the hardware 64bit popcnt instruction can outperform an SSSE3 PSHUFB
bit-parallel implementation by about a factor of 2, but only if your compiler gets it just right. Otherwise SSE can come out significantly ahead. Newer compiler versions are aware of the popcnt false dependency problem on Intel.
- https://github.com/WojciechMula/sse-popcount state-of-the-art x86 SIMD popcount for SSSE3, AVX2, AVX512BW, AVX512VBMI, or AVX512 VPOPCNT. Using Harley-Seal across vectors to defer popcount within an element. (Also ARM NEON)
- Counting 1 bits (population count) on large data using AVX-512 or AVX-2
- related: https://github.com/mklarqvist/positional-popcount - separate counts for each bit-position of multiple 8, 16, 32, or 64-bit integers. (Again, x86 SIMD including AVX-512 which is really good at this, with
vpternlogd
making Harley-Seal very good.)
The bit shifting operators do exactly what their name implies. They shift bits. Here's a brief (or not-so-brief) introduction to the different shift operators.
The Operators
>>
is the arithmetic (or signed) right shift operator.>>>
is the logical (or unsigned) right shift operator.<<
is the left shift operator, and meets the needs of both logical and arithmetic shifts.
All of these operators can be applied to integer values (int
, long
, possibly short
and byte
or char
). In some languages, applying the shift operators to any datatype smaller than int
automatically resizes the operand to be an int
.
Note that <<<
is not an operator, because it would be redundant.
Also note that C and C++ do not distinguish between the right shift operators. They provide only the >>
operator, and the right-shifting behavior is implementation defined for signed types. The rest of the answer uses the C# / Java operators.
(In all mainstream C and C++ implementations including GCC and Clang/LLVM, >>
on signed types is arithmetic. Some code assumes this, but it isn't something the standard guarantees. It's not undefined, though; the standard requires implementations to define it one way or another. However, left shifts of negative signed numbers is undefined behaviour (signed integer overflow). So unless you need arithmetic right shift, it's usually a good idea to do your bit-shifting with unsigned types.)
Left shift (<<)
Integers are stored, in memory, as a series of bits. For example, the number 6 stored as a 32-bit int
would be:
00000000 00000000 00000000 00000110
Shifting this bit pattern to the left one position (6 << 1
) would result in the number 12:
00000000 00000000 00000000 00001100
As you can see, the digits have shifted to the left by one position, and the last digit on the right is filled with a zero. You might also note that shifting left is equivalent to multiplication by powers of 2. So 6 << 1
is equivalent to 6 * 2
, and 6 << 3
is equivalent to 6 * 8
. A good optimizing compiler will replace multiplications with shifts when possible.
Non-circular shifting
Please note that these are not circular shifts. Shifting this value to the left by one position (3,758,096,384 << 1
):
11100000 00000000 00000000 00000000
results in 3,221,225,472:
11000000 00000000 00000000 00000000
The digit that gets shifted "off the end" is lost. It does not wrap around.
Logical right shift (>>>)
A logical right shift is the converse to the left shift. Rather than moving bits to the left, they simply move to the right. For example, shifting the number 12:
00000000 00000000 00000000 00001100
to the right by one position (12 >>> 1
) will get back our original 6:
00000000 00000000 00000000 00000110
So we see that shifting to the right is equivalent to division by powers of 2.
Lost bits are gone
However, a shift cannot reclaim "lost" bits. For example, if we shift this pattern:
00111000 00000000 00000000 00000110
to the left 4 positions (939,524,102 << 4
), we get 2,147,483,744:
10000000 00000000 00000000 01100000
and then shifting back ((939,524,102 << 4) >>> 4
) we get 134,217,734:
00001000 00000000 00000000 00000110
We cannot get back our original value once we have lost bits.
Arithmetic right shift (>>)
The arithmetic right shift is exactly like the logical right shift, except instead of padding with zero, it pads with the most significant bit. This is because the most significant bit is the sign bit, or the bit that distinguishes positive and negative numbers. By padding with the most significant bit, the arithmetic right shift is sign-preserving.
For example, if we interpret this bit pattern as a negative number:
10000000 00000000 00000000 01100000
we have the number -2,147,483,552. Shifting this to the right 4 positions with the arithmetic shift (-2,147,483,552 >> 4) would give us:
11111000 00000000 00000000 00000110
or the number -134,217,722.
So we see that we have preserved the sign of our negative numbers by using the arithmetic right shift, rather than the logical right shift. And once again, we see that we are performing division by powers of 2.
Related Topic
- C++ – The Definitive C++ Book Guide and List
- C++ – the difference between const int*, const int * const, and int const *
- Javascript – How to set, clear and toggle a single bit in JavaScript
- Sqlite – Improve INSERT-per-second performance of SQLite
- C++11 introduced a standardized memory model. What does it mean? And how is it going to affect C++ programming
- What does the ??!??! operator do in C
- Java – Why is processing a sorted array faster than processing an unsorted array
- C++ – an undefined reference/unresolved external symbol error and how to fix it
Best Answer
Setting a bit
Use the bitwise OR operator (
|
) to set a bit.That will set the
n
th bit ofnumber
.n
should be zero, if you want to set the1
st bit and so on upton-1
, if you want to set then
th bit.Use
1ULL
ifnumber
is wider thanunsigned long
; promotion of1UL << n
doesn't happen until after evaluating1UL << n
where it's undefined behaviour to shift by more than the width of along
. The same applies to all the rest of the examples.Clearing a bit
Use the bitwise AND operator (
&
) to clear a bit.That will clear the
n
th bit ofnumber
. You must invert the bit string with the bitwise NOT operator (~
), then AND it.Toggling a bit
The XOR operator (
^
) can be used to toggle a bit.That will toggle the
n
th bit ofnumber
.Checking a bit
You didn't ask for this, but I might as well add it.
To check a bit, shift the number n to the right, then bitwise AND it:
That will put the value of the
n
th bit ofnumber
into the variablebit
.Changing the nth bit to x
Setting the
n
th bit to either1
or0
can be achieved with the following on a 2's complement C++ implementation:Bit
n
will be set ifx
is1
, and cleared ifx
is0
. Ifx
has some other value, you get garbage.x = !!x
will booleanize it to 0 or 1.To make this independent of 2's complement negation behaviour (where
-1
has all bits set, unlike on a 1's complement or sign/magnitude C++ implementation), use unsigned negation.or
It's generally a good idea to use unsigned types for portable bit manipulation.
or
(number & ~(1UL << n))
will clear then
th bit and(x << n)
will set then
th bit tox
.It's also generally a good idea to not to copy/paste code in general and so many people use preprocessor macros (like the community wiki answer further down) or some sort of encapsulation.