Data Structures – Most Efficient Way to Store a Numeric Range

compressiondata structuresnumbers

This question is about how many bits are required to store a range. Or put another way, for a given number of bits, what is the maximum range that can be stored and how?

Imagine we want to store a sub-range within the range 0-255.

So for example, 45-74.

We can store the example above as two unsigned bytes, but it strikes me that there must be some redundancy of information there. We know that the second value is larger than the first, so in the case that the first value is large, fewer bits are required for the second value, and in the case that the second value is large, fewer bits are required for the first.

I suspect that any compression technique would yield a marginal result, so it may be a better question to ask "what is the maximum range that could be stored in one byte?". This should be larger than what is achievable by storing the two numbers separately.

Are there any standard algorithms for doing this kind of thing?

Best Answer

Just count the number of possible ranges. There are 256 ranges with lower bound 0 (0-0, 0-1, ... 0-254, 0-255), 255 ranges with lower bound 1, ... and finally 1 range with lower bound 255 (255-255). So the total number is (256 + 255 + ... + 1) = 257 * 128 = 32,896. As this is slightly higher than 215 = 32,768, you'll still need at least 16 bits (2 bytes) to store this information.

In general, for numbers from 0 up to n-1, the number of possible ranges is n*(n+1)/2. This is less than 256 if n is 22 or less: n = 22 gives 22*23/2 = 253 possibilities. So one byte suffices for sub-ranges of 0-21.

Another way to look at the problem is the following: storing a pair of integers in the range 0 to n-1 is almost the same as storing a subrange of 0-(n-1) plus a single bit which determines if the first number is lower or higher than the second one. (The difference comes from the case when both integers are equal, but this chance becomes increasingly smaller as n grows larger.) That's why you can only save about a single bit with this technique, and probably the main reason why it is rarely used.