Haskell – Mixing Erlang and Haskell

erlangfunctional programminggarbage-collectionghchaskell

If you've bought into the functional programming paradigm, the chances are that you like both Erlang and Haskell. Both have purely functional cores and other goodness such as lightweight threads that make them a good fit for a multicore world. But there are some differences too.

Erlang is a commercially proven fault-tolerant language with a mature distribution model. It has a seemingly unique feature in its ability to upgrade its version at runtime via hot code loading. (Way cool!)

Haskell, on the otherhand, has the most sophisticated type system of any mainstream language. (Where I define 'mainstream' to be any language that has a published O'Reilly book so Haskell counts.) Its straightline single threaded performance looks superior to Erlang's and its lightweight threads look even lighter too.

I am trying to put together a development platform for the rest of my coding life and was wondering whether it was possible to mix Erlang and Haskell to achieve a best of breed platform. This question has two parts:

  1. I'd like to use Erlang as a kind of fault tolerant MPI to glue GHC runtime instances together. There would be one Erlang process per GHC runtime. If "the impossible happened" and the GHC runtime died, then the Erlang process would detect that somehow and die too. Erlang's hot code loading and distribution features would just continue to work. The GHC runtime could be configured to use just one core, or all cores on the local machine, or any combination in between. Once the Erlang library was written, the rest of the Erlang level code should be purely boilerplate and automatically generated on a per application basis. (Perhaps by a Haskell DSL for example.) How does one achieve at least some of these things?
  2. I'd like Erlang and Haskell to be able to share the same garabage collector. (This is a much further out idea than 1.) Languages that run on the JVM and the CLR achieve greater mass by sharing a runtime. I understand there are technical limitations to running Erlang (hot code loading) and Haskell (higher kinded polymorphism) on either the JVM or the CLR. But what about unbundling just the garbage collector? (Sort of the start of a runtime for functional languages.) Allocation would obviously still have to be really fast, so maybe that bit needs to be statically linked in. And there should be some mechansim to distinguish the mutable heap from the immutable heap (incuding lazy write once memory) as GHC needs this. Would it be feasible to modify both HIPE and GHC so that the garbage collectors could share a heap?

Please answer with any experiences (positive or negative), ideas or suggestions. In fact, any feedback (short of straight abuse!) is welcome.

Update

Thanks for all 4 replies to date – each taught me at least one useful thing that I did not know.

Regarding the rest of coding life thing – I included it slightly tongue in cheek to spark debate, but it is actually true. There is a project that I have in mind that I intend to work on until I die, and it needs a stable platform.

In the platform I have proposed above, I would only write Haskell, as the boilerplate Erlang would be automatically generated. So how long will Haskell last? Well Lisp is still with us and doesn't look like it is going away anytime soon. Haskell is BSD3 open source and has achieved critical mass. If programming itself is still around in 50 years time, I would expect Haskell, or some continuous evolution of Haskell, will still be here.

Update 2 in response to rvirding's post

Agreed – implementing a complete "Erskell/Haslang" universal virtual machine might not be absolutely impossible, but it would certainly be very difficult indeed. Sharing just the garbage collector level as something like a VM, while still difficult, sounds an order of magnitude less difficult to me though. At the garbage collection model, functional languages must have a lot in common – the unbiquity of immutable data (including thunks) and the requirement for very fast allocation. So the fact that commonality is bundled tightly with monolithic VMs seems kind of odd.

VMs do help achieve critical mass. Just look at how 'lite' functional languages like F# and Scala have taken off. Scala may not have the absolute fault tolerance of Erlang, but it offers an escape route for the very many folks who are tied to the JVM.

While having a single heap makes
message passing very fast it
introduces a number of other problems,
mainly that doing GC becomes more
difficult as it has to be interactive
and globally non-interruptive so you
can't use the same simpler algorithms
as the per-process heap model.

Absolutely, that makes perfect sense to me. The very smart people on the GHC development team appear to be trying to solve part of the problem with a parallel "stop the world" GC.

http://research.microsoft.com/en-us/um/people/simonpj/papers/parallel-gc/par-gc-ismm08.pdf

(Obviously "stop the world" would not fly for general Erlang given its main use case.) But even in the use cases where "stop the world" is OK, their speedups do not appear to be universal. So I agree with you, it is unlikely that there is a universally best GC, which is the reason I specified in part 1. of my question that

The GHC runtime could be configured to
use just one core, or all cores on the
local machine, or any combination in
between.

In that way, for a given use case, I could, after benchmarking, choose to go the Erlang way, and run one GHC runtime (with a singlethreaded GC) plus one Erlang process per core and let Erlang copy memory between cores for good locality.

Alternatively, on a dual processor machine with 4 cores per processor with good memory bandwidth on the processor, benchmarking might suggest that I run one GHC runtime (with a parallel GC) plus one Erlang process per processor.

In both cases, if Erlang and GHC could share a heap, the sharing would probably be bound to a single OS thread running on a single core somehow. (I am getting out of my depth here, which is why I asked the question.)

I also have another agenda – benchmarking functional languages independently of GC. Often I read of results of benchmarks of OCaml v GHC v Erlang v … and wonder how much the results are confounded by the different GCs. What if choice of GC could be orthogonal to choice of functional language? How expensive is GC anyway? See this devil advocates blog post

http://john.freml.in/garbage-collection-harmful

by my Lisp friend John Fremlin, which he has, charmingly, given his post title "Automated garbage collection is rubbish". When John claims that GC is slow and hasn't really sped up that much, I would like to be able to counter with some numbers.

Best Answer

A lot of Haskell and Erlang people are interested in the model where Erlang supervises distribution, while Haskell runs the shared memory nodes in parallel doing all the number crunching/logic.

A start towards this is the haskell-erlang library: http://hackage.haskell.org/package/erlang

And we have similar efforts in Ruby land, via Hubris: http://github.com/mwotton/Hubris/tree/master

The question now is to find someone to actually push through the Erlang / Haskell interop to find out the tricky issues.