Haskell – How does foldr work

combinatorsfoldhaskell

Can anybody explain how does foldr work?

Take these examples:

Prelude> foldr (-) 54 [10, 11]
53
Prelude> foldr (\x y -> (x+y)/2) 54 [12, 4, 10, 6]
12.0

I am confused about these executions. Any suggestions?

Best Answer

The easiest way to understand foldr is to rewrite the list you're folding over without the sugar.

[1,2,3,4,5] => 1:(2:(3:(4:(5:[]))))

now what foldr f x does is that it replaces each : with f in infix form and [] with x and evaluates the result.

For example:

sum [1,2,3] = foldr (+) 0 [1,2,3]

[1,2,3] === 1:(2:(3:[]))

sum [1,2,3] === 1+(2+(3+0)) = 6

`foldl` is:

Left associative: f ( ... (f (f (f (f z x1) x2) x3) x4) ...) xn
Tail recursive: It iterates through the list, producing the value afterwards
Lazy: Nothing is evaluated until the result is needed
Backwards: foldl (flip (:)) [] reverses a list.

`foldr` is:

Right associative: f x1 (f x2 (f x3 (f x4 ... (f xn z) ... )))
Recursive into an argument: Each iteration applies f to the next value and the result of folding the rest of the list.
Lazy: Nothing is evaluated until the result is needed
Forwards: foldr (:) [] returns a list unchanged.

There's a slightly subtle point here that trips people up sometimes: Because foldl is backwards each application of f is added to the outside of the result; and because it is lazy, nothing is evaluated until the result is required. This means that to compute any part of the result, Haskell first iterates through the entire list constructing an expression of nested function applications, then evaluates the outermost function, evaluating its arguments as needed. If f always uses its first argument, this means Haskell has to recurse all the way down to the innermost term, then work backwards computing each application of f.

This is obviously a far cry from the efficient tail-recursion most functional programmers know and love!

In fact, even though foldl is technically tail-recursive, because the entire result expression is built before evaluating anything, foldl can cause a stack overflow!

On the other hand, consider foldr. It's also lazy, but because it runs forwards, each application of f is added to the inside of the result. So, to compute the result, Haskell constructs a single function application, the second argument of which is the rest of the folded list. If f is lazy in its second argument--a data constructor, for instance--the result will be incrementally lazy, with each step of the fold computed only when some part of the result that needs it is evaluated.

So we can see why foldr sometimes works on infinite lists when foldl doesn't: The former can lazily convert an infinite list into another lazy infinite data structure, whereas the latter must inspect the entire list to generate any part of the result. On the other hand, foldr with a function that needs both arguments immediately, such as (+), works (or rather, doesn't work) much like foldl, building a huge expression before evaluating it.

So the two important points to note are these:

foldr can transform one lazy recursive data structure into another.
Otherwise, lazy folds will crash with a stack overflow on large or infinite lists.

You may have noticed that it sounds like foldr can do everything foldl can, plus more. This is true! In fact, foldl is nearly useless!

But what if we want to produce a non-lazy result by folding over a large (but not infinite) list? For this, we want a strict fold, which the standard libraries thoughfully provide:

`foldl'` is:

Left associative: f ( ... (f (f (f (f z x1) x2) x3) x4) ...) xn
Tail recursive: It iterates through the list, producing the value afterwards
Strict: Each function application is evaluated along the way
Backwards: foldl' (flip (:)) [] reverses a list.

Because foldl' is strict, to compute the result Haskell will evaluate f at each step, instead of letting the left argument accumulate a huge, unevaluated expression. This gives us the usual, efficient tail recursion we want! In other words:

foldl' can fold large lists efficiently.
foldl' will hang in an infinite loop (not cause a stack overflow) on an infinite list.

The Haskell wiki has a page discussing this, as well.

Haskell – Abusing the algebra of algebraic data types – why does this work

Disclaimer: A lot of this doesn't really work quite right when you account for ⊥, so I'm going to blatantly disregard that for the sake of simplicity.

A few initial points:

Note that "union" is probably not the best term for A+B here--that's specifically a disjoint union of the two types, because the two sides are distinguished even if their types are the same. For what it's worth, the more common term is simply "sum type".
Singleton types are, effectively, all unit types. They behave identically under algebraic manipulations and, more importantly, the amount of information present is still preserved.
You probably want a zero type as well. Haskell provides that as Void. There are no values whose type is zero, just as there is one value whose type is one.

There's still one major operation missing here but I'll get back to that in a moment.

As you've probably noticed, Haskell tends to borrow concepts from Category Theory, and all of the above has a very straightforward interpretation as such:

Given objects A and B in Hask, their product A×B is the unique (up to isomorphism) type that allows two projections fst : A×B → A and snd : A×B → B, where given any type C and functions f : C → A, g : C → B you can define the pairing f &&& g : C → A×B such that fst ∘ (f &&& g) = f and likewise for g. Parametricity guarantees the universal properties automatically and my less-than-subtle choice of names should give you the idea. The (&&&) operator is defined in Control.Arrow, by the way.
The dual of the above is the coproduct A+B with injections inl : A → A+B and inr : B → A+B, where given any type C and functions f : A → C, g : B → C, you can define the copairing f ||| g : A+B → C such that the obvious equivalences hold. Again, parametricity guarantees most of the tricky parts automatically. In this case, the standard injections are simply Left and Right and the copairing is the function either.

Many of the properties of product and sum types can be derived from the above. Note that any singleton type is a terminal object of Hask and any empty type is an initial object.

Returning to the aforementioned missing operation, in a cartesian closed category you have exponential objects that correspond to arrows of the category. Our arrows are functions, our objects are types with kind *, and the type A -> B indeed behaves as B^A in the context of algebraic manipulation of types. If it's not obvious why this should hold, consider the type Bool -> A. With only two possible inputs, a function of that type is isomorphic to two values of type A, i.e. (A, A). For Maybe Bool -> A we have three possible inputs, and so on. Also, observe that if we rephrase the copairing definition above to use algebraic notation, we get the identity C^A × C^B = C^A+B.

As for why this all makes sense--and in particular why your use of the power series expansion is justified--note that much of the above refers to the "inhabitants" of a type (i.e., distinct values having that type) in order to demonstrate the algebraic behavior. To make that perspective explicit:

The product type (A, B) represents a value each from A and B, taken independently. So for any fixed value a :: A, there is one value of type (A, B) for each inhabitant of B. This is of course the cartesian product, and the number of inhabitants of the product type is the product of the number of inhabitants of the factors.
The sum type Either A B represents a value from either A or B, with the left and right branches distinguished. As mentioned earlier, this is a disjoint union, and the number of inhabitants of the sum type is the sum of the number of inhabitants of the summands.
The exponential type B -> A represents a mapping from values of type B to values of type A. For any fixed argument b :: B, any value of A can be assigned to it; a value of type B -> A picks one such mapping for each input, which is equivalent to a product of as many copies of A as B has inhabitants, hence the exponentiation.

While it's tempting at first to treat types as sets, that doesn't actually work very well in this context--we have disjoint union rather than the standard union of sets, there's no obvious interpretation of intersection or many other set operations, and we don't usually care about set membership (leaving that to the type checker).

On the other hand, the constructions above spend a lot of time talking about counting inhabitants, and enumerating the possible values of a type is a useful concept here. That quickly leads us to enumerative combinatorics, and if you consult the linked Wikipedia article you'll find that one of the first things it does is define "pairs" and "unions" in exactly the same sense as product and sum types by way of generating functions, then does the same for "sequences" that are identical to Haskell's lists using exactly the same technique you did.

Edit: Oh, and here's a quick bonus that I think demonstrates the point strikingly. You mentioned in a comment that for a tree type T = 1 + T^2 you can derive the identity T^6 = 1, which is clearly wrong. However, T^7 = T does hold, and a bijection between trees and seven-tuples of trees can be constructed directly, cf. Andreas Blass's "Seven Trees in One".

Edit×2: On the subject of the "derivative of a type" construction mentioned in other answers, you might also enjoy this paper from the same author which builds on the idea further, including notions of division and other interesting whatnot.

Best Answer

Related Solutions

Haskell – foldl versus foldr behavior with infinite lists

foldl is:

foldr is:

foldl' is:

Haskell – Abusing the algebra of algebraic data types – why does this work

Related Topic

`foldl` is:

`foldr` is:

`foldl'` is: