Electronic – Is qNaN easier than trapping on overflow

cpuexceptionfloating pointperformance

According to James Demmel's article, "Faster Numerical Algorithms via Exception Handling", IEEE Trans. on Computers 43 (1994), 983-992, the way a floating-point processor handles overflow can be important to the speed of operations that are not typically expected to overflow, because it is often the case that we have a choice between a fast algorithm that can overflow for rare inputs, or a slow algorithm that guards against overflow in all cases, so it is useful to be able to use this strategy:

  1. Try the fast algorithm
  2. In the rare case where overflow occurred, go back and redo the calculation with the slow algorithm

There are basically two ways to detect overflow:

  1. Trap. To be usable for this strategy, it must be possible to trap to user code. To trap at the precise operation would incur an unreasonably large performance penalty for pipelined implementations, so it has to be accepted that it will be imprecise.
  2. Infinity/quiet NaN/sticky status bit. Keep going, but remember that an overflow occurred, so that an explicit check can be carried out occasionally. To be usable for this strategy, the presence of NaN operands must not cause too dramatic a slowdown. (There have been a number of processors on which NaNs slowed arithmetic operations by a couple of orders of magnitude.)

Modern processors like x64 typically default to the second strategy.

My question here is about the hardware implementation of floating-point traps versus NaN. Is it actually less costly to implement quiet NaN without slowdown, than to provide even imprecise trap on overflow? Or is quiet NaN widely used today for a different reason?

Best Answer

It isn't the arithmetic that slows things down, it's the exception processing -- especially the processing required to clean up after an imprecise trap. qNaN defers exceptions, keeping the pipeline full for a greater percentage of the time.

There's no inherent reason that processing NaNs should be any slower than normal numbers, because the rules are quite clear. In fact, if anything, I would expect them to be faster.