Design – How to design for good abstractions using algebraic data type

algebraic-data-typedesignfunctional programming

Every now and then I have peaked at Haskell Tutorials and found the Algebraic data types quite interesting. I took their purpose to be to represent types that have completely separable states. Sadly, I never got to write more Haskell than tutorial level projects, and so I never had to really design programs using this pattern.

Now I am writing some Rust, and I have algebraic datatypes (enums) in the toolbox. However, I am not very confident in using them.

Let me start with an example where I am confident that such an enum is a proper choice.

enum Tree {
    Leaf(i: String),
    Branch(Tree, Tree)
}

The same example would be applicaple for an XML-like structures, etc.

With other data, I am not so confident about using enum types. Let's take a connection object

enum Connection {
    UnConnected(...),
    ConnectedConnection(....)
}

Here we would have a Connection type with two possible values, one representing the state where a connection is not yet established, the other one could represent the state of a connected connection (and wrapping a connection handle for example).

The other possibility would be to introduce 2 types for a Connection-Template and a connected connection.

Another example that I found in rust code is that in the hyper library. There is the type Response that represents a HTTP response. It is a generic type.
Response<Fresh> represents the state where headers are not yet frozen. Once upon it is mapped to a Response<Streaming>, which can be used to write the body of the response. It seems like what Hyper models here with different types (Response<Fresh> vs Response<Streaming>) could have been modeled with enum types as well.

The approach with different types allows for more safety, Response<Fresh> does not implement streaming (I think) and Response<Streamimg> does.

Do you know of guidelines and best practices that guide through modelling logic in types properly?

Best Answer

Even though you said algebraic data types, you seem to mostly be asking about sum types, so I will focus on those. Product types are more common and more easily understood.

Sum types are most easily understood not by thinking about what you're modeling, but by thinking about the code that uses it. People tend to think of sum types as representing states that are mutually exclusive, but this isn't the entire picture. To use a sum type, you should also have a use for the encompassing type. This represents an indeterminate state, where at the point of calling the function, you know you either have or need one of the term types, but you don't know which one. If you always know which one statically, you should just create separate types.

For your Tree example, that means you should have functions that actually take or return a Tree. As you traverse down the tree, you have this state where you don't know if the next node down is going to be a Leaf or a Branch, so you need a type that can be either. If you only had functions that take or return Leafs or Branches, you wouldn't need a sum type.

Your Connection example is a completely different matter. At some point, you have a connect function that takes in an Unconnected and returns a Connected. At some later point, you have a disconnect function that does the reverse. There is no point where you don't know statically if a connection is connected or not, therefore you don't need a type that can hold either.

If you have a hard time seeing appropriate situations to use sum types, my recommendation is to start out not using them. If they are appropriate, at some point you'll hit a function that can't be written otherwise, then you can add it then. Forcing a sum type when it's not needed leads to a lot of unnecessary pattern matching that could be much cleaner with separate functions.

Related Topic