Consider a single-linked list. It looks something like
data List x = Node x (List x) | End
It is natural to define a folding function such as
reduce :: (x -> y -> y) -> y -> List x -> y
In a sense, reduce f x0
replaces every Node
with f
and every End
with x0
. This is what the Prelude refers to as a fold.
Now consider a simple binary tree:
data Tree x = Leaf x | Branch (Tree x) (Tree x)
It is similarly natural to define a function such as
reduce :: (y -> y -> y) -> (x -> y) -> Tree x -> y
Notice that this reduction has a quite different character; whereas the list-based one is inherently sequential, this new tree-based one has more of a divide-and-conquer feel to it. You could even imagine throwing a few par
combinators in there. (Where would you put such a thing in the list version?)
My question: Is this function still classified as a "fold", or is it something else? (And if so, what is it?)
Basically whenever anybody talks about folding, they always talk about folding lists, which is inherently sequential. I'm wondering whether "sequential" is part of the definition of what a fold is, or whether this is merely a coincidental property of the most commonly-used example of folding.
Best Answer
A Fold for Every Occasion
We can actually come up with a generic notion of folding which can apply to a whole bunch of different types. That is, we can systematically define a
fold
function for lists, trees and more.This generic notion of
fold
corresponds to the catamorphisms @pelotom mentioned in his comment.Recursive Types
The key insight is that these
fold
functions are defined over recursive types. In particular:Both of these types are clearly recursive--
List
in theCons
case andTree
in theBranch
case.Fixed Points
Just like functions, we can rewrite these types using fixed points. Remember the definition of
fix
:We can actually write something very similar for types, except that it has to have an extra constructor wrapper:
Just like
fix
defines the fixed point of a function, this defines the fixed point of a functor. We can express all our recursive types using this newFix
type.This allows us to rewrite
List
types as follows:Essentially,
Fix
allows us to nestListContainer
s to arbitrary depths. So we could have:which correspond to
[]
,[1]
and[1,2]
respectively.Seeing that
ListContainer
is aFunctor
is easy:I think the mapping from
ListContainer
toList
is pretty natural: instead of recursing explicitly, we make the recursive part a variable. Then we just useFix
to fill that variable in as appropriate.We can write an analogous type for
Tree
as well."Unwrapping" Fixed Points
So why do we care? We can define
fold
for arbitrary types written usingFix
. In particular:Essentially, all a fold does is unwrap the "rolled" type one layer at a time, applying a function to the result each time. This "unrolling" lets us define a fold for any recursive type and neatly and naturally generalize the concept.
For the list example, it works like this:
Roll
to get either aCons
or aNil
fmap
.Nil
case,fmap (fold h) Nil = Nil
, so we just returnNil
.Cons
case, thefmap
just continues the fold over the rest of the list.fold
ending in aNil
--just like the standardfoldr
.Comparing Types
Now lets look at the types of the two fold functions. First,
foldr
:Now,
fold
specialized toListContainer
:At first, these look completely different. However, with a bit of massaging, we can show they're the same. The first two arguments to
foldr
area -> b -> b
andb
. We have a function and a constant. We can think ofb
as() -> b
. Now we have two functions_ -> b
where_
is()
anda -> b
. To make life simpler, let's curry the second function giving us(a, b) -> b
. Now we can write them as a single function usingEither
:This is true because given
f :: a -> c
andg :: b -> c
, we can always write the following:So now we can view
foldr
as:(We are always free to add parentheses around
->
like this as long as they're right-associative.)Now lets look at
ListContainer
. This type has two cases:Nil
, which carries no information andCons
, which has both ana
and ab
. Put another way,Nil
is like()
andCons
is like(a, b)
, so we can write:Clearly this is the same as what I used in
foldr
above. So now we have:So, in fact, the types are isomorphic--just different ways of writing the same thing! I think that's pretty cool.
(As a side note, if you want to know more about this sort of reasoning with types, check out The Algebra of Algebraic Data Types, a nice series of blog posts about just this.)
Back to Trees
So, we've seen how we can define a generic
fold
for types written as fixed points. We've also seen how this corresponds directly tofoldr
for lists. Now lets look at your second example, the binary tree. We have the type:we can rewrite this using
Fix
by following the rules I did above: we replace the recursive part with a type variable:Now we have a tree
fold
:Your original
foldTree
looks like this:foldTree
accepts two functions; we'll combine the into one by first currying and then usingEither
:We can also see how
Either (b, b) a
is isomorphic toTreeContainer a b
. Tree container has two cases:Branch
, containing twob
s andLeaf
, containing onea
.So these fold types are isomorphic in the same way as the list example.
Generalizing
There is a clear pattern emerging. Given a normal recursive data type, we can systematically create a non-recursive version of the type, which lets us express the type as a fixed point of a functor. This means that we can mechanically come up with
fold
functions for all these different types--in fact, we could probably automate the entire process using GHC Generics or something like that.In a sense, this means that we do not really have different
fold
functions for different types. Rather, we have a singlefold
function which is very polymorphic.More
I first fully understood these ideas from a talk given by Conal Elliott. This goes into more detail and also talks about
unfold
, which is the dual tofold
.If you want to delve into this sort of thing even more deeply, read the fantastic "Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire" paper. Among other things, this introduces the notions of "catamorphisms" and "anamorphisms" which correspond to folds and unfolds.
Algebras (and Coalgebras)
Also, I can't resist adding a plug for myself :P. You can see some interesting similarities between the way we use
Either
here and the way I used it when talking about algebras in another SO answer.There is in fact a deep connection between
fold
and algebras; moreover,unfold
--the aforementioned dual offold
--is connected to coalgebras, which are the dual of algebras. The important idea is that algebraic data types correspond to "initial algebras" which also define folds as outlined in the rest of my answer.You can see this connection in the general type of
fold
:The
f a -> a
term looks very familiar! Remember that an f-algebra was defined as something like:So we can think of
fold
as just:Essentially,
fold
just lets us "summarize" structures defined using the algebra.