Setting a bit
Use the bitwise OR operator (|
) to set a bit.
number |= 1UL << n;
That will set the n
th bit of number
. n
should be zero, if you want to set the 1
st bit and so on upto n-1
, if you want to set the n
th bit.
Use 1ULL
if number
is wider than unsigned long
; promotion of 1UL << n
doesn't happen until after evaluating 1UL << n
where it's undefined behaviour to shift by more than the width of a long
. The same applies to all the rest of the examples.
Clearing a bit
Use the bitwise AND operator (&
) to clear a bit.
number &= ~(1UL << n);
That will clear the n
th bit of number
. You must invert the bit string with the bitwise NOT operator (~
), then AND it.
Toggling a bit
The XOR operator (^
) can be used to toggle a bit.
number ^= 1UL << n;
That will toggle the n
th bit of number
.
Checking a bit
You didn't ask for this, but I might as well add it.
To check a bit, shift the number n to the right, then bitwise AND it:
bit = (number >> n) & 1U;
That will put the value of the n
th bit of number
into the variable bit
.
Changing the nth bit to x
Setting the n
th bit to either 1
or 0
can be achieved with the following on a 2's complement C++ implementation:
number ^= (-x ^ number) & (1UL << n);
Bit n
will be set if x
is 1
, and cleared if x
is 0
. If x
has some other value, you get garbage. x = !!x
will booleanize it to 0 or 1.
To make this independent of 2's complement negation behaviour (where -1
has all bits set, unlike on a 1's complement or sign/magnitude C++ implementation), use unsigned negation.
number ^= (-(unsigned long)x ^ number) & (1UL << n);
or
unsigned long newbit = !!x; // Also booleanize to force 0 or 1
number ^= (-newbit ^ number) & (1UL << n);
It's generally a good idea to use unsigned types for portable bit manipulation.
or
number = (number & ~(1UL << n)) | (x << n);
(number & ~(1UL << n))
will clear the n
th bit and (x << n)
will set the n
th bit to x
.
It's also generally a good idea to not to copy/paste code in general and so many people use preprocessor macros (like the community wiki answer further down) or some sort of encapsulation.
You are running into the old problem with floating point numbers that not all numbers can be represented exactly. The command line is just showing you the full floating point form from memory.
With floating point representation, your rounded version is the same number. Since computers are binary, they store floating point numbers as an integer and then divide it by a power of two so 13.95 will be represented in a similar fashion to 125650429603636838/(2**53).
Double precision numbers have 53 bits (16 digits) of precision and regular floats have 24 bits (8 digits) of precision. The floating point type in Python uses double precision to store the values.
For example,
>>> 125650429603636838/(2**53)
13.949999999999999
>>> 234042163/(2**24)
13.949999988079071
>>> a = 13.946
>>> print(a)
13.946
>>> print("%.2f" % a)
13.95
>>> round(a,2)
13.949999999999999
>>> print("%.2f" % round(a, 2))
13.95
>>> print("{:.2f}".format(a))
13.95
>>> print("{:.2f}".format(round(a, 2)))
13.95
>>> print("{:.15f}".format(round(a, 2)))
13.949999999999999
If you are after only two decimal places (to display a currency value, for example), then you have a couple of better choices:
- Use integers and store values in cents, not dollars and then divide by 100 to convert to dollars.
- Or use a fixed point number like decimal.
Best Answer
First, a paper you should consider reading, if you want to understand floating point foibles better: "What Every Computer Scientist Should Know About Floating Point Arithmetic," http://www.validlab.com/goldberg/paper.pdf
And now to some meat.
The following code is bare bones, and attempts to produce an IEEE-754 single precision float from an
unsigned int
in the range 0 < value < 224. That's the format you're most likely to encounter on modern hardware, and it's the format you seem to reference in your original question.IEEE-754 single-precision floats are divided into three fields: A single sign bit, 8 bits of exponent, and 23 bits of significand (sometimes called a mantissa). IEEE-754 uses a hidden 1 significand, meaning that the significand is actually 24 bits total. The bits are packed left to right, with the sign bit in bit 31, exponent in bits 30 .. 23, and the significand in bits 22 .. 0. The following diagram from Wikipedia illustrates:
The exponent has a bias of 127, meaning that the actual exponent associated with the floating point number is 127 less than the value stored in the exponent field. An exponent of 0 therefore would be encoded as 127.
(Note: The full Wikipedia article may be interesting to you. Ref: http://en.wikipedia.org/wiki/Single_precision_floating-point_format )
Therefore, the IEEE-754 number 0x40000000 is interpreted as follows:
So the value is 1.0 x 21 = 2.0.
To convert an
unsigned int
in the limited range given above, then, to something in IEEE-754 format, you might use a function like the one below. It takes the following steps:reinterpret_cast
, converts the resulting bit-pattern to afloat
. This part is an ugly hack, because it uses a type-punned pointer. You could also do this by abusing aunion
. Some platforms provide an intrinsic operation (such as_itof
) to make this reinterpretation less ugly.There are much faster ways to do this; this one is meant to be pedagogically useful, if not super efficient:
You can make this process more efficient using functions that detect the leading 1 in a number. (These sometimes go by names like
clz
for "count leading zeros", ornorm
for "normalize".)You can also extend this to signed numbers by recording the sign, taking the absolute value of the integer, performing the steps above, and then putting the sign into bit 31 of the number.
For integers >= 224, the entire integer does not fit into the significand field of the 32-bit float format. This is why you need to "round": You lose LSBs in order to make the value fit. Thus, multiple integers will end up mapping to the same floating point pattern. The exact mapping depends on the rounding mode (round toward -Inf, round toward +Inf, round toward zero, round toward nearest even). But the fact of the matter is you can't shove 24 bits into fewer than 24 bits without some loss.
You can see this in terms of the code above. It works by aligning the leading 1 to the hidden 1 position. If a value was >= 224, the code would need to shift right, not left, and that necessarily shifts LSBs away. Rounding modes just tell you how to handle the bits shifted away.