Is order of arguments in an arithmetic expression important to achieve as most exact result as possible (speed is not necessary)

Actually, in this question I do not ask about particular language or architecture, however I understand there might be some differences.

In physics / engineering it is usually better to handle greater numbers than smaller ones, because of better absolute accuracy. For example when I need to calculate the area of a circle, having its diameter, I'd use the equation S = d * d / 4 * pi rather than S = pi * (d / 2) * (d / 2) (in both cases calculating from left to right).

How does it look in programming languages?

Is it important to provide "better" order of arguments? Are there any compilers that can optimize calculation for this?

Do such constructions make sense:

// finding a result of a*b/c
if (abs(c)<1){
    result = a / c * b;
} else {
    result = a * b / c;
}

(in the example one should also test values of a and b but let's assume they are large numbers)?

I know that using too large numbers I am risking an overflow. I know there is a difference between integer numbers (which are better in addition/subtraction) and float (which are better in multiplication / division).

There is also an issue with integer division, for example in Pascal there is common * multiplication operator and if both numbers are Integer, the result is Integer too, if one of them is Real (equivalent to float), the result is Real too. For division there are two operators: / which always results with Real number and div which takes only Integers and the result is Integer. So in this language it would be better first calculate multiplications and then divisions because Integer divisions may lead to losing the fractional part, and it is better to lose it later than earlier.

But for float numbers, which are stored with mantissa and exponent order of multiplications/divisions seems to be not necessary.

What I want to achieve is as most exact result as possible (speed is not necessary).

Best Answer

There are two cases to consider: multiplication (including division) and addition (including subtraction). Because floating point numbers are stored in exponential form (i.e as m*2^e), these operations are performed on the mantissa (m) and exponent (e) as separate values, not on the whole numbers involved.

Multiplication (basically) involves multiplying the mantissae (which are always between 1 and 2) and adding the exponents (which are integers). It follows that unless the integer addition of the exponents overflows, the absolute magnitude of the numbers makes no difference to the precision of the result.

For addition, however, things are different. To add floating point numbers, one of the mantissae is shifted by an appropriate number of bits to make the exponents equal, and then the mantissae are added. This shifting operation loses precision proportionally to the difference in exponents. Thus, addition is more precise when the operands are closer in magnitude. In practice this means that if you are adding many floating point numbers that may vary wildly in magnitude, it is best to start with the smallest (closest to zero) and progress out to add the larger magnitude.numbers at the end. If you don't know the relative magnitudes in advance, put them in an array and sort the array by magnitude before summing it. If you do, use an accumulator variable and add the smaller numbers first.

Best Answer

Related Solutions

Implementing base-10 floating point division

How Lua handles both integer and float numbers

Related Topic