Data Structures – Why Use Arbitrary-Precision Decimals Over Rational Numbers?

arithmeticdata structuresmath

The basic arithmetic a computer chip can do only works on numbers(integers or floating point) of a fixed size.

There are many algorithms that could be extended to work on numbers of arbitrary size (and some require them). Therefore, a way to store and perform arithmetic with arbitrary-size numbers is good to have.

For arbitrary-size integers, the approach usually is to take a list of bytes (which each can store a value in the 0-255 range) and have the final 'big number' (often shorted as bignum) then be base-256, by using each of these bytes as 'digits' (ducentiquinquagintiquinquesexa-its?).

For any non-integer arbitrary-sized real number, there exist two different ways of representation:

  • decimals, which consist of an arbitrary-size 'big number' mantissa and exponent. The number is represented by sign(+1 or -1) * mantissa * pow(base, exponent) (where base is usually 2 or sometimes 10)
  • rationals, which have an arbitrary size 'big number' numerator and denominator. The number is represented by numerator / denominator

In practice, I've found many more languages and libraries to use/support decimal data types than rational data types. An example of this are (SQL) data stores, which have a native DECIMAL data type but not a rational one.

Why is this the case? Why are decimal data types preferred over rational ones?

Best Answer

A quibble

Arbitrary-precision decimals are actually fairly rare. E.g. although you mention SQL in the question, the SQL standard doesn't require implementations to support arbitrary precision. E.g. MS SQL Server only supports up to 38 digits of precision. Other libraries could more accurately be described as supporting variable precision rather than arbitrary precision: e.g. Java's BigDecimal operates within a MathContext which defines the precision of the results of an operation. A true arbitrary-precision implementation wouldn't require you to commit up front to the precision you will need in the future. (Yes, that means it must necessarily be lazy).

Relevance

Why am I making this distinction? Because when you're working with fixed precision (whether limited, as in SQL Server, or variable, as in Java) a series of operations uses a predictable amount of memory. When working with rationals, on the other hand, the memory usage can grow without bound.

This is an important difference which I believe goes a long way to explaining why historically language and library designers have favoured decimals over rationals. Particularly when you look at e.g. SQL, which is a product of an era when memory was much more restricted than today, it makes sense to choose a representation which allows you to bound your memory usage in advance.

Other plausible factors

I don't say that memory bounds are the only factor that influenced design decisions. There are two other plausible factors which come to mind readily:

  • The familiarity of the first generations of computer scientists with decimal tables of logarithms etc.
  • Hardware support. Back in the day, some instruction sets included instructions for operating on binary-coded decimal.
Related Topic