Data Structures – Most Efficient Way to Store a Numeric Range

compressiondata structuresnumbers

This question is about how many bits are required to store a range. Or put another way, for a given number of bits, what is the maximum range that can be stored and how?

Imagine we want to store a sub-range within the range 0-255.

So for example, 45-74.

We can store the example above as two unsigned bytes, but it strikes me that there must be some redundancy of information there. We know that the second value is larger than the first, so in the case that the first value is large, fewer bits are required for the second value, and in the case that the second value is large, fewer bits are required for the first.

I suspect that any compression technique would yield a marginal result, so it may be a better question to ask "what is the maximum range that could be stored in one byte?". This should be larger than what is achievable by storing the two numbers separately.

Are there any standard algorithms for doing this kind of thing?

Best Answer

Just count the number of possible ranges. There are 256 ranges with lower bound 0 (0-0, 0-1, ... 0-254, 0-255), 255 ranges with lower bound 1, ... and finally 1 range with lower bound 255 (255-255). So the total number is (256 + 255 + ... + 1) = 257 * 128 = 32,896. As this is slightly higher than 2¹⁵ = 32,768, you'll still need at least 16 bits (2 bytes) to store this information.

In general, for numbers from 0 up to n-1, the number of possible ranges is n*(n+1)/2. This is less than 256 if n is 22 or less: n = 22 gives 22*23/2 = 253 possibilities. So one byte suffices for sub-ranges of 0-21.

Another way to look at the problem is the following: storing a pair of integers in the range 0 to n-1 is almost the same as storing a subrange of 0-(n-1) plus a single bit which determines if the first number is lower or higher than the second one. (The difference comes from the case when both integers are equal, but this chance becomes increasingly smaller as n grows larger.) That's why you can only save about a single bit with this technique, and probably the main reason why it is rarely used.

Related Solutions

C# – any document which says exactly what range of numbers are .NET BigIntegers designed for

Any number that could possibly get larger than ULong.MaxValue, or smaller than Long.MinValue should be represented using BigInteger.

If NOT (Long.MinValue <= X <= ULong.MaxValue) Then BigInteger

BigInteger is for too large numbers than normal primitives can handle.

For example if your integer is outside the range of Long, you should probably use BigInteger. These cases are very rare though, and using these classes have significantly higher overhead than their primitive counterparts.

For example, long is 64 bits wide and can hold the range: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,80. ulong can hold 0 to 18,446,744,073,709,551,615. If your numbers are larger or smaller than that, BigInteger is your only option

The only time I've seen them used in a real world application was a starchartting application.

Storage Schema – Representing Integer Numbers from 0 to Infinity

An approach I've used: count the number of leading 1 bits, say n. The size of the number is then 2^n bytes (including the leading 1 bits). Take the bits after the first 0 bit as an integer, and add the maximum value (plus one) that can be represented by a number using this encoding in 2^(n-1) bytes.

Thus,

                  0 = 0b00000000
                   ...
                127 = 0b01111111
                128 = 0b1000000000000000
                   ...
              16511 = 0b1011111111111111
              16512 = 0b11000000000000000000000000000000
                   ...
          536887423 = 0b11011111111111111111111111111111
          536887424 = 0b1110000000000000000000000000000000000000000000000000000000000000
                   ...
1152921505143734399 = 0b1110111111111111111111111111111111111111111111111111111111111111
1152921505143734400 = 0b111100000000000000000000000000000000000000000000 ...

This scheme allows any non-negative value to be represented in exactly one way.

(Equivalently, used the number of leading 0 bits.)

Best Answer

Related Solutions

C# – any document which says exactly what range of numbers are .NET BigIntegers designed for

Storage Schema – Representing Integer Numbers from 0 to Infinity

Related Topic