Floating-Point – Why Are Float and Double Needed?

floating point

I was watching http://www.joelonsoftware.com/items/2011/06/27.html and laughed at Jon Skeet joke about 0.3 not being 0.3. I personally never had problems with floats/decimals/doubles but then I remember I learned 6502 very early and never needed floats in most of my programs. The only time I used it was for graphics and math where inaccurate numbers were ok and the output was for the screen and not to be stored (in a db, file) or dependent on.

My question is, where are places were you typically use floats/decimals/double? So I know to watch out for these gotchas. With money I use longs and store values by the cent, for speed of an object in a game I add ints and divide (or bitshift) the value to know if I need to move a pixel or not. (I made object move in the 6502 days, we had no divide nor floats but had shifts).

So I was mostly curious.

Best Answer

Because they are, for most purposes, more accurate than integers.

Now how is that? "for speed of an object in a game..." this is a good example for such a case. Say you need to have some very fast objects, like bullets. To be able to describe their motion with integer speed variables, you need to make sure the speeds are in the range of the integer variables, that means you cannot have an arbitrarily fine raster.

But then, you might also want to describe some very slow objects, like the hour hand of a clock. As this is about 6 orders of magnitude slower than the bullet objects, the first ld(10⁶) ≈ 20 bits are zero, that rules out short int types from the start. Ok, today we have longs everywhere, which leave us with a still-comfortable 12 bits. But even then, the clock speed will be exact to only to four decimal places. That's not a very good clock... but it's certainly ok for a game. Just, you would not want to make the raster much coarser than it already is.

...which leads to problems if some day you should like to introduce a new, even faster type of object. There is no "headroom" left.

What happens if we choose a float type? Same size of 32 bits, but now you have a full 24 bits of precision, for all objects. That means, the clock has enough precision to stay in sync up-to-the-seconds for years. The bullets have no higher precision, but they only "live" for fractions of a second anyway, so it would be utterly useless if they had. And you do not get into any kind of trouble if you want to describe even much faster objects (why not speed of light? No problem) or much slower ones. You certainly won't need such things in a game, but you sometimes do in physics simulations.

And with floating-point numbers, you get this same precision always and without first having to cleverly choose some non-obvious raster. That is perhaps the most important point, as such choice necessities are very error-prone.

Related Topic