Haskell ways to the 3n+1 problem

functional programminghaskellprogramming practices

Here is a simple programming problem from SPOJ: http://www.spoj.com/problems/PROBTRES/.

Basically, you are asked to output the biggest Collatz cycle for numbers between i and j. (Collatz cycle of a number $n$ is the number of steps to eventually get from $n$ to 1.)

I have been looking for a Haskell way to solve the problem with comparative performance than that of Java or C++ (so as to fits in the allowed run-time limit). Although a simple Java solution that memoizes the cycle length of any already computed cycles will work, I haven't been successful at applying the idea to obtain a Haskell solution.

I have tried the Data.Function.Memoize, as well as home-brewed log time memoization technique using the idea from this post: https://stackoverflow.com/questions/3208258/memoization-in-haskell. Unfortunately, memoization actually makes the computation of cycle(n) even slower. I believe the slow down comes from the overhead of the Haskell way. (I tried running with the compiled binary code, instead of interpreting.)

I also suspect that simply iterating numbers from i to j can be costly ($i,j\le10^6$). So I even tried precompute everything for the range query, using idea from http://blog.openendings.net/2013/10/range-trees-and-profiling-in-haskell.html. However, this still gives "Time Limit Exceeding" error.

Can you help to inform a neat competitive Haskell program for this?

Best Answer

I'll answer in Scala, because my Haskell isn't as fresh, and so people will believe this is a general functional programming algorithm question. I'll stick to data structures and concepts that are readily transferable.

We can start with a function that generates a collatz sequence, which is relatively straightforward, except for needing to pass the result as an argument to make it tail recursive:

def collatz(n: Int, result: List[Int] = List()): List[Int] = {
   if (n == 1) {
     1 :: result
   } else if ((n & 1) == 1) {
     collatz(3 * n + 1, n :: result)
   } else {
     collatz(n / 2, n :: result)
   }
 }

This actually puts the sequence in reverse order, but that's perfect for our next step, which is to store the lengths in a map:

def calculateLengths(sequence: List[Int], length: Int,
  lengths: Map[Int, Int]): Map[Int, Int] = sequence match {
    case Nil     => lengths
    case x :: xs => calculateLengths(xs, length + 1, lengths + ((x, length)))
}

You would call this with the answer from the first step, the initial length, and an empty map, like calculateLengths(collatz(22), 1, Map.empty)). This is how you memoize the result. Now we need to modify collatz to be able to use this:

def collatz(n: Int, lengths: Map[Int, Int], result: List[Int] = List()): (List[Int], Int) = {
  if (lengths contains n) {
     (result, lengths(n))
  } else if ((n & 1) == 1) {
    collatz(3 * n + 1, lengths, n :: result)
  } else {
    collatz(n / 2, lengths, n :: result)
  }
}

We eliminate the n == 1 check because we can just initialize the map with 1 -> 1, but we need to add 1 to the lengths we put in the map inside calculateLengths. It now also returns the memoized length where it stopped recursing, which we can use to initialize calculateLengths, like:

val initialMap = Map(1 -> 1)
val (result, length) = collatz(22, initialMap)
val newMap = calculateLengths(result, lengths, initialMap)

Now we have relatively efficient implementations of the pieces, we need to find a way to feed the results of the previous calculation into the input of the next calculation. This is called a fold, and looks like:

def iteration(lengths: Map[Int, Int], n: Int): Map[Int, Int] = {
  val (result, length) = collatz(n, lengths)
  calculateLengths(result, length, lengths)
}

val lengths = (1 to 10).foldLeft(Map(1 -> 1))(iteration)

Now to find the actual answer, we just need to filter the keys in the map between the given range, and find the max value, giving a final result of:

def answer(start: Int, finish: Int): Int = {
  val lengths = (start to finish).foldLeft(Map(1 -> 1))(iteration)
  lengths.filterKeys(x => x >= start && x <= finish).values.max
}

In my REPL for ranges of size 1000 or so, like the example input, the answer returns pretty much instantaneously.

Related Solutions

Haskell vs Erlang for web services

The only question I have is what is your web service doing? If the web service is truly a functional problem, then Haskell will be a better fit.

Erlang isn't necessarily a functional language. It's a procedural language with a very strong execution model for massively parallel systems. It was designed for the telecom industry, and it would definitely make an excellent fit for responding to web service requests.

See this page* for an overview of the differences between procedural and functional programming. (Apologies in advance for the ugly black on cyan page).

If your web service is doing a fair amount of pattern matching and applying rules, then Haskell is your choice. If you just want a scalable infrastructure that isn't too different from the languages you probably already know, choose Erlang.

(* link via Wayback machine. The original file has been removed)

Is return-type-(only)-polymorphism in Haskell a good thing

I actually think that return type polymorphism is one of the best features of type classes. After having used it for a while, it is sometimes hard for me to go back to OOP style modeling where I don't have it.

Consider the encoding of algebra. In Haskell we have a type class Monoid (ignoring mconcat)

class Monoid a where
   mempty :: a
   mappend :: a -> a -> a

How could we encode this as an interface in an OO language? The short answer is we can't. That's because the type of mempty is (Monoid a) => a aka, return type polymorphism. Having the ability to model algebra is incredibly useful IMO.*

You start your post with the complaint about "Referential Transparency." This raises an important point: Haskell is a value oriented language. So expressions like read 3 don't have to be understood as things that compute values, they can also be understood as values. What this means is that the real issue is not return type polymorphism: it is values with polymorphic type ([] and Nothing). If the language should have these, then it really has to have polymorphic return types for consistency.

Should we be able to say [] is of type forall a. [a]? I think so. These features are very useful, and they make the language much simpler.

If Haskell had subtype polymorphism [] could be a subtype for all [a]. The problem is, that I don't know of a way of encoding that without having the type of the empty list be polymorphic. Consider how it would be done in Scala (it is shorter than doing it in the canonical statically typed OOP language, Java)

abstract class List[A]
case class Nil[A] extends List[A]
case class Cons[A](h: A. t: List[A]) extends List[A]

Even here, Nil() is an object of type Nil[A] **

Another advantage of return type polymorphism is that it makes the Curry-Howard embedding much simpler.

Consider the following logical theorems:

 t1 = forall P. forall Q. P -> P or Q
 t2 = forall P. forall Q. P -> Q or P

We can trivially capture these as theorems in Haskell:

data Either a b = Left a | Right b
t1 :: a -> Either a b
t1 = Left
t2 :: a -> Either b a
t2 = Right

To sum up: I like return type polymorphism, and only think it breaks referential transparency if you have a limited notion of values (although this is less compelling in the ad hoc type class case). On the other hand, I do find your points about MR and type defaulting compelling.

*. In the comments ysdx points out this isn't strictly true: we could re-implement type classes by modeling the algebra as another type. Like the java:

abstract class Monoid<M>{
   abstract M empty();
   abstract M append(M m1, M m2);
}

You then have to pass objects of this type around with you. Scala has a notion of implicit parameters which avoids some, but in my experience not all, of the overhead of explicitly managing these things. Putting your utility methods (factory methods, binary methods, etc) on a separate F-bounded type turns out to be an incredibly nice way of managing things in an OO language that has support for generics. That said, I'm not sure I would have groked this pattern if I didn't have experience modeling things with typeclasses, and I'm not sure other people will.

It also has limitations, out of the box there is no way to get an object that implements the typeclass for an arbitrary type. You have to either pass the values explicitly, use something like Scala's implicits, or use some form of dependency injection technology. Life gets ugly. On the other hand, it is nice that you can have multiple implementations for the same type. Something can be a Monoid in multiple ways. Also, carrying around these structures separately has IMO a more mathematically modern, constructive, feel to it. So, although I still generally prefer the Haskell way of doing this, I probably overstated my case.

Typeclasses with return type polymorphism makes this kind of thing easy to handle. That doesn't meen it is the best way to do it.

**. Jörg W Mittag points out this isn't really the canonical way of doing this in Scala. Instead, we would follow the standard library with something more like:

abstract class List[+A] ...  
case class Cons[A](head: A, tail: List[A]) extends List[A] ...
case object Nil extends List[Nothing] ...

This takes advantage of Scala's support for bottom types, as well as covariant type paramaters. So, Nil is of type Nil not Nil[A]. At this point we are pretty far from Haskell, but it is interesting to note how Haskell represents the bottom type

undefined :: forall a. a

That is, it isn't the subtype of all types, it is polymorphically(sp) a member of all types.
Yet more return type polymorphism.

Best Answer

Related Solutions

Haskell vs Erlang for web services

Is return-type-(only)-polymorphism in Haskell a good thing

Related Topic