Data Structures – How to Store ‘Unknown’ and ‘Missing’ Values in a Variable

data structuresdata typestype-systems

Consider this an "academic" question. I have been wondering about about avoiding NULLs from time to time and this is an example where I can't come up with a satisfactory solution.


Let's assume I store measurements where on occasions the measurement is known to be impossible (or missing). I would like to store that "empty" value in a variable while avoiding NULL. Other times the value could be unknown. So, having the measurements for a certain time-frame, a query about a measurement within that time period could return 3 kinds of responses:

  • The actual measurement at that time (for example, any numerical value including 0)
  • A "missing"/"empty" value (i.e., a measurement was done, and the value is known to be empty at that point).
  • An unknown value (i.e., no measurement has been done at that point. It could be empty, but it could also be any other value).

Important Clarification:

Assuming you had a function get_measurement() returning one of "empty", "unknown" and a value of type "integer". Having a numerical value implies that certain operations can be done on the return value (multiplication, division, …) but using such operations on NULLs will crash the application if not caught.

I would like to be able to write code, avoiding NULL checks, for example (pseudocode):

>>> value = get_measurement()  # returns `2`
>>> print(value * 2)
4

>>> value = get_measurement()  # returns `Empty()`
>>> print(value * 2)
Empty()

>>> value = get_measurement()  # returns `Unknown()`
>>> print(value * 2)
Unknown()

Note that none of the print statements caused exceptions (as no NULLs were used). So the empty & unknown values would propagate as necessary and the check whether a value is actually "unknown" or "empty" could be delayed until really necessary (like storing/serialising the value somewhere).


Side-Note: The reason I'd like to avoid NULLs, is primarily a brain-teaser. If I want to get stuff done I'm not opposed to using NULLs, but I found that avoiding them can make code a lot more robust in some cases.

Best Answer

The common way to do this, at least with functional languages is to use a discriminated union. This is then a value that is one of a valid int, a value that denotes "missing" or a value that denotes "unknown". In F#, it might look something like:

type Measurement =
    | Reading of value : int
    | Missing
    | Unknown of value : RawData

A Measurement value will then be a Reading, with an int value, or a Missing, or an Unknown with the raw data as value (if required).

However, if you aren't using a language that supports discriminated unions, or their equivalent, this pattern isn't likely of much use to you. So there, you could eg use a class with an enum field that denotes which of the three contains the correct data.