Why Haskell and Scheme Use Singly-Linked Lists

data structuresfunctional programming

A doubly linked list has minimal overhead (just another pointer per cell), and allows you to append to both ends and go back and forth and generally have a lot of fun.

Best Answer

Well, if you look a bit deeper, both actually include arrays in the base language as well:

  • The 5th revised Scheme Report (R5RS) includes the vector type, which are fixed-size integer-indexed collections with better than linear time for random access.
  • The Haskell 98 Report has an array type as well.

Functional programming instruction, however, has long emphasized single-linked lists over arrays or double-linked lists. Quite likely overemphasized, in fact. There are several reasons for it, however.

First one is that single-linked lists are one of the simplest and yet most useful recursive data types. A user-defined equivalent of Haskell's list type can be defined like this:

data List a           -- A list with element type `a`...
  = Empty             -- is either the empty list...
  | Cell a (List a)   -- or a pair with an `a` and the rest of the list. 

The fact that lists are a recursive data type means that the functions that work on lists generally use structural recursion. In Haskell terms: you pattern match on the list constructors, and you recurse on a subpart of the list. In these two basic function definitions, I use the variable as to refer to the tail of the list. So note that the recursive calls "descend" down the list:

map :: (a -> b) -> List a -> List b
map f Empty = Empty
map f (Cell a as) = Cell (f a) (map f as)

filter :: (a -> Bool) -> List a -> List a
filter p Empty = Empty
filter p (Cell a as)
    | p a = Cell a (filter p as)
    | otherwise = filter p as

This technique guarantees that your function will terminate for all finite lists, and also is a good problem-solving technique—it tends to naturally splits problems into simpler, more tenable subparts.

So single-linked lists are probably the best data type to introduce students to these techniques, which are very important in functional programming.

The second reason is less of a "why single-linked lists" reason, but more of a "why not double-linked lists or arrays" reason: those latter data types often call for mutation (modifiable variables), which functional programming very often shies away from. So as it happens:

  • In an eager language like Scheme you can't make a double-linked list without using mutation.
  • In a lazy language like Haskell you can make a double-linked list without using mutation. But whenever you make a new list based off that one, you are forced to copy most if not all of the structure of the original. Whereas with single-linked lists you can write functions that use "structure sharing"—new lists can reuse the cells of old lists when appropriate.
  • Traditionally, if you used arrays in an immutable manner it meant that every time you wanted to modify the array you had to copy the whole thing. (Recent Haskell libraries like vector, however, have found techniques that greatly improve on this problem).

The third and final reason applies to lazy languages like Haskell primarily: lazy single-linked lists, in practice, are often more similar to iterators than to in-memory lists proper. If your code is consuming the elements of a list sequentially and throwing them out as you go, the object code will only materialize the list cells and its contents as you step forward through the list.

This means that the whole list doesn't need to exist in memory at once, only the current cell. Cells before the current one can be garbage collected (which wouldn't be possible with a double-linked list); cells later than the current one don't need to be computed until you get there.

It goes even further than that. There's technique used in several popular Haskell libraries, called fusion, where the compiler analyzes your list-processing code and spots intermediate lists that are being generated and consumed sequentially and then "thrown away." With this knowledge, the compiler can completely eliminate the memory allocation of those lists' cells. This means that a single-linked list in a Haskell source program, after compilation, might actually get turned into a loop instead of a data structure.

Fusion is also the technique that the aforementioned vector library uses to generate efficient code for immutable arrays. Same goes for the extremely popular bytestring (byte arrays) and text (Unicode strings) libraries, that were built as a replacement for Haskell's not-very-great native String type (which is the same as [Char], single-linked list of character). So in modern Haskell there is a trend where immutable array types with fusion support are becoming very common.

List fusion is facilitated by the fact that in a single-linked list you can go forward but never backwards. This brings up a very important theme in functional programming: using the "shape" of a data type to derive the "shape" of a computation. If you want to process elements sequentially a single-linked list is a data type that, when you consume it with structural recursion, gives you that access pattern very naturally. If you want to use a "divide and conquer" strategy to attack a problem, then tree data structures tend to support that very well.

A lot of people drop out of the functional programming wagon early on, so they get exposure to the single-linked lists but not to the more advanced underlying ideas.