Scala – Pattern matching and infinite streams

lazy-evaluationpattern matchingscala

So, I'm working to teach myself Scala, and one of the things I've been playing with is the Stream class. I tried to use a naïve translation of the classic Haskell version of Dijkstra's solution to the Hamming number problem:

object LazyHammingBad {
  private def merge(a: Stream[BigInt], b: Stream[BigInt]): Stream[BigInt] =
    (a, b) match {
      case (x #:: xs, y #:: ys) =>
        if (x < y) x #:: merge(xs, b)
        else if (y < x) y #:: merge(a, ys)
        else x #:: merge(xs, ys)
    }

  val numbers: Stream[BigInt] =
    1 #:: merge(numbers map { _ * 2 },
      merge(numbers map { _ * 3 }, numbers map { _ * 5 }))
}

Taking this for a spin in the interpreter led quickly to disappointment:

scala> LazyHammingBad.numbers.take(10).toList
java.lang.StackOverflowError

I decided to look to see if other people had solved the problem in Scala using the Haskell approach, and adapted this solution from Rosetta Code:

object LazyHammingGood {
  private def merge(a: Stream[BigInt], b: Stream[BigInt]): Stream[BigInt] =
    if (a.head < b.head) a.head #:: merge(a.tail, b)
    else if (b.head < a.head) b.head #:: merge(a, b.tail)
    else a.head #:: merge(a.tail, b.tail)

  val numbers: Stream[BigInt] = 
    1 #:: merge(numbers map {_ * 2}, 
            merge(numbers map {_ * 3}, numbers map {_ * 5}))
}

This one worked nicely, but I still wonder how I went wrong in LazyHammingBad. Does using #:: to destructure x #:: xs force the evaluation of xs for some reason? Is there any way to use pattern matching safely with infinite streams, or do you just have to use head and tail if you don't want things to blow up?

Best Answer

a match {case x#::xs =>... is about the same as val (x, xs) = (a.head, a.tail). So the difference between the bad version and the good one, is that in that in the bad version, you're calling a.tail and b.tail right at the start, instead of just use them to build the tail of the resulting stream. Furthermore when you use them at the right of #:: (not pattern matching, but building the result, as in #:: merge(a.b.tail) you are not actually calling merge, that will be done only later, when accessing the tail of the returned Stream. So in the good version, a call to merge does not call tail at all. In the bad version, it calls it right at start.

Now if you consider numbers, or even a simplified version, say 1 #:: merge(numbers, anotherStream), when you call you call tail on that (as take(10) will), merge has to be evaluated. You call tail on numbers, which call merge with numbers as parameters, which calls tails on numbers, which calls merge, which calls tail...

By contrast, in super lazy Haskell, when you pattern match, it does barely any work. When you do case l of x:xs, it will evaluate l just enough to know whether it is an empty list or a cons. If it is indeed a cons, x and xs will be available as two thunks, functions that will eventually give access, later, to content. The closest equivalent in Scala would be to just test empty.

Note also that in Scala Stream, while the tail is lazy, the head is not. When you have a (non empty) Stream, the head has to be known. Which means that when you get the tail of the stream, itself a stream, its head, that is the second element of the original stream, has to be computed. This is sometimes problematic, but in your example, you fail before even getting there.

Related Solutions

Bash – How to use inverse or negative wildcards when pattern matching in a unix/linux shell

In Bash you can do it by enabling the extglob option, like this (replace ls with cp and add the target directory, of course)

~/foobar> shopt extglob
extglob        off
~/foobar> ls
abar  afoo  bbar  bfoo
~/foobar> ls !(b*)
-bash: !: event not found
~/foobar> shopt -s extglob  # Enables extglob
~/foobar> ls !(b*)
abar  afoo
~/foobar> ls !(a*)
bbar  bfoo
~/foobar> ls !(*foo)
abar  bbar

You can later disable extglob with

shopt -u extglob

Scala – How is pattern matching in Scala implemented at the bytecode level

The low level can be explored with a disassembler but the short answer is that it's a bunch of if/elses where the predicate depends on the pattern

case Sum(l,r) // instance of check followed by fetching the two arguments and assigning to two variables l and r but see below about custom extractors 
case "hello" // equality check
case _ : Foo // instance of check
case x => // assignment to a fresh variable
case _ => // do nothing, this is the tail else on the if/else

There's much more that you can do with patterns like or patterns and combinations like "case Foo(45, x)", but generally those are just logical extensions of what I just described. Patterns can also have guards, which are additional constraints on the predicates. There are also cases where the compiler can optimize pattern matching, e.g when there's some overlap between cases it might coalesce things a bit. Advanced patterns and optimization are an active area of work in the compiler, so don't be surprised if the byte code improves substantially over these basic rules in current and future versions of Scala.

In addition to all that, you can write your own custom extractors in addition to or instead of the default ones Scala uses for case classes. If you do, then the cost of the pattern match is the cost of whatever the extractor does. A good overview is found in http://lamp.epfl.ch/~emir/written/MatchingObjectsWithPatterns-TR.pdf

Best Answer

Related Solutions

Bash – How to use inverse or negative wildcards when pattern matching in a unix/linux shell

Scala – How is pattern matching in Scala implemented at the bytecode level

Related Topic