The best designed collection framework I have personally worked with, is Scala's, especially combined with the extensions from the Scalaz library.
The Haskell community currently seems to be the frontrunner when it comes to figuring out very general collection interfaces, see e.g. the Data.Traversable
or Data.Foldable
type classes, zippers, iteratees, enumeratees etc. (A lot of this is also implemented in Scalaz.)
Here's an example how to solve your problem in Scala. Note, I haven't done anything to make this efficient or beautiful, nor have I given any thought to whether this algorithm is sensible or not.
The important feature is that this code is only dependent on operations provided by the Seq
trait (zipWithIndex
, sorted
, foldRight
, filterNot
), which is very general: it only assumes that the collection has a definite size and that elements may appear multiple times. That's it.
val s = Seq(5, 1, 9, 2, 6) // our test sequence
val sWithIndex = s zipWithIndex
val rejects = sWithIndex.sorted.foldRight(
(Int.MaxValue, (
(Int.MinValue, Int.MinValue), (Int.MinValue, Int.MinValue)))) {
case ((el, i), (minDist, ((n1, i1), (n2, i2)))) =>
if(math.abs(n2 - el) <= minDist)
(math.abs(n2 - el), ((n2, i2), (el, i)))
else
(minDist, ((n1, i1), (n2, i2)))
}._2
val rejectIndices = (rejects._1._2, rejects._2._2)
val result = sWithIndex filterNot { case (_, i) =>
i == rejectIndices._1 || i == rejectIndices._2 } map (_._1)
// result = List(1, 9, 2)
This algorithm is O(n*log n), because sorted
is a generic stable sort. You could make it O(n) by using a radix sort, since the elements are known to be integers.
For me, the big advantage of the heterogeneous AST is that it forms a kind of forced, annotated switch
statement (assuming a C-like language).
For the homogeneous AST you usually end up with some kind of routine or class with a big switch
statement. You need to keep track of which child node is what yourself. "First child is the conditional, second the true-block, third the false-block." Whenever you change the code, you easily find yourself making a mental picture of your DSL syntax over and over again.
Of course you can document heavily, but a good program ought to be self-documenting as much as possible. The heterogeneous AST does just that.
Furthermore, you can easily turn a heterogeneous AST into a homogeneous one, but not the other way around. Add the tag info (which is a good idea, unless your language supports a cheap is-a
query). You can add Node(int index)
methods to return the named fields. So you lose nothing in generality by using the heterogeneous AST.
I won't mention that the heterogeneous AST is ideal for the Visitor pattern, as it is just as easy to use the Strategy pattern with the homogeneous switch
routine. It is easier to add specific functionality to the heterogeneous AST itself, though. If you want to turn it into an interpreter, all you need to do is add some kind of "eval" methods.
I would consider a homogeneous AST if there are limiting circumstances. If you need to port the compiler to a system with no OOP language available, or if you need to optimize for speed. The homogeneous AST is easier to combine with an FSM. The latter can also be an advantage if you want to have a general multi-purpose compiler that loads syntax rules on the fly. But it is easier to start out with a heterogeneous AST that will generate those tables, after the compiler has been thoroughly tested.
So, all in all, I would say that neither tree offers specific advantages in terms of "does this tree help or hinder in, say, 'semantic passes'?" The advantage of the heterogeneous AST is, in my experience, to reduce the amount of thought and concentration you have to put into coding the tedious stuff of the compiler. There is a lot of repetitiveness and bookkeeping going on, so let the computer do the work for you as much as possible, is my motto.
Best Answer
Here is something part way between an example and an analogy. You have some errands to do, so you grab a piece of paper and write:
Then you remember that you also need to buy stamps. Because of the geography of your town, you need to do that after the bank. You could copy your whole list onto a new piece of paper:
or you could scribble on the one you had:
As you thought of other errands, you might write them at the bottom of the list, but with arrows reminding yourself what order to do them in. This is a linked list. It's quicker and easier than copying the whole list around every time you add something.
Then your cell phone rings while you're at the bank "hey, I got the stamps, don't pick up any more". You just cross STAMPS off the list, you don't rewrite a whole new one without STAMPS in it.
Now you could actually implement an errands list in code (maybe an app that puts your errands in order based on your geography) and there's a reasonable chance you would actually use a linked list for that in code. You want to add and remove lots of items, order matters, but you don't want to recopy the whole list after each insertion or deletion.