Every so often when programmers are complaining about null errors/exceptions someone asks what we do without null.
I have some basic idea of the coolness of option types, but I don't have the knowledge or languages skill to best express it. What is a great explanation of the following written in a way approachable to the average programmer that we could point that person towards?
- The undesirability of having references/pointers be nullable by default
- How option types work including strategies to ease checking null cases such as
- pattern matching and
- monadic comprehensions
- Alternative solution such as message eating nil
- (other aspects I missed)
Best Answer
I think the succinct summary of why null is undesirable is that meaningless states should not be representable.
Suppose I'm modeling a door. It can be in one of three states: open, shut but unlocked, and shut and locked. Now I could model it along the lines of
and it is clear how to map my three states into these two boolean variables. But this leaves a fourth, undesired state available:
isShut==false && isLocked==true
. Because the types I have selected as my representation admit this state, I must expend mental effort to ensure that the class never gets into this state (perhaps by explicitly coding an invariant). In contrast, if I were using a language with algebraic data types or checked enumerations that lets me definethen I could define
and there are no more worries. The type system will ensure that there are only three possible states for an instance of
class Door
to be in. This is what type systems are good at - explicitly ruling out a whole class of errors at compile-time.The problem with
null
is that every reference type gets this extra state in its space that is typically undesired. Astring
variable could be any sequence of characters, or it could be this crazy extranull
value that doesn't map into my problem domain. ATriangle
object has threePoint
s, which themselves haveX
andY
values, but unfortunately thePoint
s or theTriangle
itself might be this crazy null value that is meaningless to the graphing domain I'm working in. Etc.When you do intend to model a possibly-non-existent value, then you should opt into it explicitly. If the way I intend to model people is that every
Person
has aFirstName
and aLastName
, but only some people haveMiddleName
s, then I would like to say something likewhere
string
here is assumed to be a non-nullable type. Then there are no tricky invariants to establish and no unexpectedNullReferenceException
s when trying to compute the length of someone's name. The type system ensures that any code dealing with theMiddleName
accounts for the possibility of it beingNone
, whereas any code dealing with theFirstName
can safely assume there is a value there.So for example, using the type above, we could author this silly function:
with no worries. In contrast, in a language with nullable references for types like string, then assuming
you end up authoring stuff like
which blows up if the incoming Person object does not have the invariant of everything being non-null, or
or maybe
assuming that
p
ensures first/last are there but middle can be null, or maybe you do checks that throw different types of exceptions, or who knows what. All these crazy implementation choices and things to think about crop up because there's this stupid representable-value that you don't want or need.Null typically adds needless complexity. Complexity is the enemy of all software, and you should strive to reduce complexity whenever reasonable.
(Note well that there is more complexity to even these simple examples. Even if a
FirstName
cannot benull
, astring
can represent""
(the empty string), which is probably also not a person name that we intend to model. As such, even with non-nullable strings, it still might be the case that we are "representing meaningless values". Again, you could choose to battle this either via invariants and conditional code at runtime, or by using the type system (e.g. to have aNonEmptyString
type). The latter is perhaps ill-advised ("good" types are often "closed" over a set of common operations, and e.g.NonEmptyString
is not closed over.SubString(0,0)
), but it demonstrates more points in the design space. At the end of the day, in any given type system, there is some complexity it will be very good at getting rid of, and other complexity that is just intrinsically harder to get rid of. The key for this topic is that in nearly every type system, the change from "nullable references by default" to "non-nullable references by default" is nearly always a simple change that makes the type system a great deal better at battling complexity and ruling out certain types of errors and meaningless states. So it is pretty crazy that so many languages keep repeating this error again and again.)