How to a computer round the last digit in a floating point representation

floating pointnumeric precision

I'm confused by how a computer rounds off the last digit in the floating point representation. For example, I'm told that x=1.24327789 is stored in a computer with a 6-digit capacity then it;s floating point representation would be x=0.124328x101, where clearly the last digit has been rounded.

My confusion refers to how the computer can have the capacity to round this last digit, if it hasn't a 7-digit capacity in order to know the 'last' digit.

I probably have a half-assed way of understanding this representation, but I really have no background in CompSci.

Best Answer

With a few odd exceptions, a floating point number is stored as binary in the standard known as IEEE 754. These are most often 32 bit (single percision) and 64 bit (double precision) representations. The 32 bit representation can store approximately 7 decimal digits, but remember that the underlying representation is in binary.

The representation of 1.2432778910 is actually 00111111100111110010001110111011 as a single precision IEEE 754 floating point number in binary.

This is made up of three parts:

  • The sign bit (0 indicating it is positive)
  • The exponent (01111111 which is 127) giving 2127-127 coming out to be 20
  • The mantissa (00111110010001110111011) which has a leading 1 implicit.

This gives us +20 * 1.00111110010001110111011 which then gives you your number. If you look at the first couple bits there of 1.00111112 you will see that this is rather close to 1.2510 or 1.012.

On reading binary numbers past the binary (not decimal) point...

Just as 10012 represents 1*23 + 0*22 + 0*21 + 1*20, the value 1.0112 represets 1*20 + 0*2-1 + 1*2-2 + 1*2-3 or 1 + (1/4) + (1/8)

Now, that conversion I did a bit above - I grabbed it from an IEEE 754 converter because doing it by hand is tedious - its typically a good part of an assignment at the college level.

Rounding is actually a big deal. As described in Lecture Notes on the Status of IEEE Standard 754 for Binary Floating-Point Arithmetic from '97, rounding issues abounded in the 70s.

The number 1.24327789 in binary is 1.0011111001000111011101011011010101100011110011111000100000111...2

So, the 1 is assumed and the mantissa is 23 bits of that...

         1         2   |
12345678901234567890123v
0011111001000111011101011011010101100011110011111000100000111

And you see at the arrow that this number should be rounded up which gives us 001111100100011101110112 which is the mantissa from above. And thats how it is represented and rounded. You should note that as this is rounded up it is slightly greater than the original and closer to 1.24327790737152110.