Information: Consider a 16-bit register of the following format is used to store a floating point number. Mantissa (M) is denoted as normalized signed magnitude fraction, Exponent (E) is expressed in excess-64 form. Base of the system is 2.
If we calculate then, we get, exponent is allotted 7 bits and Mantissa is allotted 8 bits.
Therefore, largest number that can be represented using this information is as follows:
| 0 | 1 1 1 1 1 1 1 | 1 1 1 1 1 1 1 1 |
i.e. filling every bit with 1 and as we are looking for largest number we are having sign of the number as 0.
What is the value of the largest number that can be represented in base 10?
We will use following formula: \$(-1)^S$ * 1.M * 2^{E-B}\$ i.e. implicit normalization with biasing.
I don't understand the exponent part of the number
How we got exponent as \$2^{127-64}\$. Why we are subtracting bias 64 from exponent 127?
Can someone explain me with proper derivation/explanation that how we arrived at \$2^{127-64}\$? Please explain it as you are explaining to naive person.
I am missing something very obvious!
Waiting for explanation!
Best Answer
The exponent is biased so that the format can better represent fractional numbers between 0 and 1. It's a way to extend the lower extent in range of precision that the format can handle. It turns out that values from 0 to 1 are quite important in most floating point calculation, more important than representing bigger magnitudes, so sacrificing half the upper range is a reasonable trade-off.
But there's another, more important reason for using bias (as opposed to 2’s complement) that I'll get to later, a reason that goes back to the very beginnings of floating point.
Anyway, in this format you basically have these key values and ranges:
Some fine points:
This example format is something like what IEEE754 does. IEEE754 also reserves special values for -infinity, +infinity, and not-a-number (NaN). Play around with it here: https://www.h-schmidt.net/FloatConverter/IEEE754.html
And now, the buried lede: Why use bias at all? Because it avoids needing to use 2's complement in the exponent, which would make simple greater- and less-than comparisons between float values harder.
With bias, you can do a magnitude compare with just a single integer subtract of the mantissa and exponent fields (sign bit is masked off and handled separately.) That’s not possible if 2’s were used for the exponent, as negative exponents would look like large integer values to an integer compare, giving a wrong result.
In other words, a biased exponent yields an always-increasing integer value from zero to positive infinity. (Try it in that app I linked.)
The side-effect of using bias is that it complicates float-to-fixed and fixed-to-float, but this is usually a rare operation that in any event is efficiently dealt with by the FPU.
And I mentioned a history of bias. The IBM 709 used biased exponents, way back in 1957, as did its predecessor, the 704, in 1954.