Avoiding Division by Zero Using Float Comparison

comparisonfloating point

Source code analyzers (like SonarQube) complain about float (or double) equality comparisons because equality is a tricky thing with floats; the values compared can be the results of computations which often have minute rounding effects, so that 0.3 - 0.2 == 0.1 often returns false while mathematically it should always return true (as tested with Python 2.7). So this complaint makes perfect sense to warn about potentially dangerous code.

A typical approach for such situations is to check for a margin, an epsilon, which should compensate all rounding effects, e. g.

if abs(a - b) < epsilon then …

On the other hand one can often see code which avoids a division-by-zero problem by checking the divisor for equality with zero before the division takes place:

if divisor == 0.0 then
    // do some special handling like skipping the list element,
    // return 0.0 or whatever seems appropriate, depending on context
else
    result = divident / divisor
endif

This seems to handle the div-by-zero issue but is not compliant with the source code analyzer who still complains about the spot divisor == 0.0. On first sight it looks like a problem with the analyzer. It seems like a false positive. Float-equality checks for 0.0 should be allowed, shouldn't they?

After some consideration I thought about the case that the divisor was the result of a computation which should have resulted in 0.0 (like 0.3 - 0.2 - 0.1) and which now was something in the range of 1e-17 or 0.00000000000000001.

There are two approaches for this now:

The value is not exactly 0.0, hence the division can take place, the resulting value will be a "normal" floating point number (probably; consider 1e200 / 1e-200 which is inf). Let it happen, the caller has to take care of the results.

The value should have been 0.0, it logically is in this case, the computer just doesn't notice it, so whatever special handling of the zero case was intended should take place here as well.

If we vote for the second option, we could use the epsilon approach and be fine. But that would treat true non-zero values which are just very small like zero-ish values. We have no way of distinguishing the two cases.

This leads to the next consideration whether such a true non-zero value which is very close to 0.0 nevertheless should be divided by or whether it should be handled like the zero case (i. e. receive the special handling). After all, dividing by such a small value will result in very large values which will often be problematic (in graphs or similar). This is surely up to the context and cannot be answered in general.

I also considered whether the existence of zero(-ish) values in the input was maybe not the root of the problem but just an effect in itself, i. e. maybe the root of the trouble lay deeper: Maybe an algorithm which expects a float and which is supposed to divide by it should never receive values which can become zero(-ish) in the first place.

I can think of use cases with integers where one may need to check for them being zero before dividing (e. g. an index whose difference to a reference index is used as divisor, when both become the same in some iteration, the difference is 0), but I couldn't think of a good example where a float value could become zero-ish. Maybe if such a thing occurred, it was just a logical error?

So, now my questions are:

Is there a theory about the topic of float-zero-checks to avoid division-by-zero problems addressing my considerations? I found nothing on the Internet about it yet.
Can someone provide a reasonable example of a context and an algorithm therein which is supposed to expect float values which can become zero and by which it should divide? And depending on that context which solution (epsilon, pure == 0.0-check, maybe a different approach) would you prefer there?

Best Answer

I side with OP's personal theory that it is not a normal practice to allow a computer program to proceed with a division-by-zero operation, or to only perform a minimal check before the division.

The exception is when you are implementing something that is too general - a programming language (such as MATLAB) where you (as the programmer) do not know the context / application / use-case / physical meaning of the mathematical operations it is asked to perform. This may be because the formula it is evaluating is provided by the customer, and you do not know the customer's use-case of that formula. In that case you use a special representation such as Inf or NaN as a placeholder.

If, however, the formula is provided as part of a statistical toolbox, then you should be able to provide an explanation when the situation arises. See the "weighted averaging when the total weight is zero" example below.

There is a way to "invert" a divisor underflow test. Mathematically if b is not zero, and
abs(a) / abs(b) > abs(c) where c is the largest representable floating point value, then
abs(a) > abs(c) * abs(b). However, in practice it requires a more careful implementation than that. You may be able to find a mathematical library function that allows you to pass in (a, b) and it will return whether the division will overflow, underflow, or otherwise have poor precision.

Source code analyzers look for patterns in the code; they are not sophisticated enough to decide whether someone's workaround logic is sufficient for the application's design purpose. (In fact even the average programmer may be unqualified to make that decision.) Source code analyzers are supposed to be augmented with a person qualified to make that decision.

A denominator of zero can occur in a lot of mathematical manipulations: formulas, infinite series (summation of sequence), etc. There are many mathematical methods to calculate the result despite having denominators that approach zero (i.e. not exactly zero, but are smaller than the machine-representable value). These means the formulas are not to be evaluated verbatim - they are transformed using some calculus methods, and for each formula there may be several alternative versions which is chosen to avoid the division-by-zero issue.

Another situation arises in weighted averaging of data. If you perform a query that selects a subset of data, and when:

the sum of weights for the subset of data turns out to be zero, or
when the subset is indeed empty, i.e. the query returns no result

then the proper way to phrase that situation is "insufficient samples (data) for the query", etc.

In basic trigonometry, some representations (slope) are very sensitive to division problems, whereas an alternative representation (bearing, i.e. angle) would not be sensitive. For example, to represent a line on a 2D plane, where vertical and near-vertical lines need to be represented as robustly as horizontal and near-horizontal lines, you can:

have a toggle between lines that are steep vs. those that are not. For lines steeper than 45 degrees, you would use (x / y) instead of (y / x) as the "flipped" slope of the line, so as to avoid the division by small numbers.
Use an alternative representation such as a*x + b*y + c == 0 and store the parameters (a, b, c) with the requirement that (a^2 + b^2) must equal 1.0 for normal case, and 0.0 if the line is degenerate (not-a-line).

It is worth mentioning that degeneracy is unavoidable in many different contexts (and in context-specific ways). For example, if user passes in a "line" from point (x1, y1) to point (x2, y2) and asks to calculate its slope, and it happens that (x1 == x2 and y1 == y2), then there is no slope, because there is no line, because there is only a single point in the user's input.

Best Answer

Related Solutions

Why do you need float/double

Implementing base-10 floating point division

Related Topic