Haskell Production Code – How Often is ‘seq’ Used?

haskellprogramming practicesstackoverflow

I have some experience writing small tools in Haskell and I find it very intuitive to use, especially for writing filters (using interact) that process their standard input and pipe it to standard output.

Recently I tried to use one such filter on a file that was about 10 times larger than usual and I got a Stack space overflow error.

After doing some reading (e.g. here and here) I have identified two guidelines to save stack space (experienced Haskellers, please correct me if I write something that is not correct):

  1. Avoid recursive function calls that are not tail-recursive (this is valid for all functional languages that support tail-call optimization).
  2. Introduce seq to force early evaluation of sub-expressions so that expressions do not grow too large before they are reduced (this is specific to Haskell, or at least to languages using lazy evaluation).

After introducing five or six seq calls in my code my tool runs smoothly again (also on the larger data). However, I find the original code was a bit more readable.

Since I am not an experienced Haskell programmer I wanted to ask if introducing seq in this way is a common practice, and how often one will normally see seq in Haskell production code. Or are there any techniques that allow to avoid using seq too often and still use little stack space?

Best Answer

Unfortunately there are cases when one has to use seq in order to get a efficient/well working program for large data. So in many cases, you cannot do without it in production code. You can find more information in Real World Haskell, Chapter 25. Profiling and optimization.

However, there are possibilities how to avoid using seq directly. This can make code cleaner and more robust. Some ideas:

  1. Use conduit, pipes or iteratees instead of interact. Lazy IO is known to have problems with managing resources (not just memory) and iteratees are designed exactly to overcome this. (I'd suggest to avoid lazy IO alltogether no matter how large your data is - see The problem with lazy I/O.)
  2. Instead of using seq directly use (or design your own) combinators such as foldl' or foldr' or strict versions of libraries (such as Data.Map.Strict or Control.Monad.State.Strict) that are designed for strict computations.
  3. Use BangPatterns extension. It allows to replace seq with strict pattern matching. Declaring strict constructor fields could be also useful in some cases.
  4. It's also possible to use Strategies for forcing evaluation. Strategies library is mostly aimed at parallel computations, but has methods for forcing a value to WHNF (rseq) or full NF (rdeepseq) as well. There are many utility methods for working with collections, combining strategies etc.
Related Topic