How to determine how many numbers are in a floating point number system

I'm trying to make floating point number systems a bit more intuitive for myself. There are a few things I am confused about, and I think the best way to clear up my confusions would be for someone could guide me through one question: How many numbers are there in a floating point number system (Given the base, precision, max exponent, and min exponent)?

This is what I am thinking:

I figure that the maximum possible number (realmax) divided by the smallest possible number (realmin) would give all of the possible positive numbers. Part of me also believes I could divide realmax by the smallest increment, machine epsilon (eps), to figure out how many numbers there are. Just looking at the differences in magnitude between eps (10^-16 for IEEE double precision) and realmin (10^-308), however, tells me this isn't true at all. I can't think of an intuition for why!

So the problem I am facing is determining the correct formula for realmax and realmin. The answer I get using my textbook's formulas differ radically from the one I got with wikipedia's formula.

Help!

Best Answer

The following is based on my own deduction and I have no proof of its accuracy.

Ultimately, how many bytes does a floating point number consume? A computer can't possibly represent more unique numbers than it can unique bit patterns. For a 64-bit floating point (C# double) there are 2^64 unique values. Note that some combinations give equivilent values. Quoting Wikipedia:

While the exponent (11-bits for C# double) can be positive or negative, in binary formats it is stored as an unsigned number that has a fixed "bias" added to it. Values of all 0s in this field are reserved for the zeros and subnormal numbers, values of all 1s are reserved for the infinities and NaNs.

So this means there's 2^53 combinations that represent infinate or invalid numbers, and 2^53 combinations that represent zero and subnormal numbers. I can't say one way or the other whether there are other bit-combinations that will produce the same number.

2^64 - 2^53 + 3 = 18,437,736,874,454,810,627 unique values (Represents all bit combinations with positive infinity, negative infinity, and not-a-number combinations being condensed to three unique values.)

Read Floating point, Internal representation.

Best Answer

Related Solutions

The difference between a floating decimal number and fixed decimal number

CPU Architecture – Floating-Point Math Optimization

Related Topic