Data Structures – Rules for Using Linked Lists vs Arrays

data structureslinked list

A linked list can be used when you want cheap insertion and deletion of elements and when it doesn't matter that the elements aren't next to each other in memory.

This is very abstract and I would like a concrete explanation of why a linked list should be used rather than an array. I'm not very experienced with programming, so I haven't got much (if any) real-world experience.

Best Answer

Here is something part way between an example and an analogy. You have some errands to do, so you grab a piece of paper and write:

bank
groceries
drop off drycleaning

Then you remember that you also need to buy stamps. Because of the geography of your town, you need to do that after the bank. You could copy your whole list onto a new piece of paper:

bank
stamps
groceries
drop off drycleaning

or you could scribble on the one you had:

bank ....... STAMPS
groceries
drop off drycleaning

As you thought of other errands, you might write them at the bottom of the list, but with arrows reminding yourself what order to do them in. This is a linked list. It's quicker and easier than copying the whole list around every time you add something.

Then your cell phone rings while you're at the bank "hey, I got the stamps, don't pick up any more". You just cross STAMPS off the list, you don't rewrite a whole new one without STAMPS in it.

Now you could actually implement an errands list in code (maybe an app that puts your errands in order based on your geography) and there's a reasonable chance you would actually use a linked list for that in code. You want to add and remove lots of items, order matters, but you don't want to recopy the whole list after each insertion or deletion.

Related Solutions

Data Structures – Standardized Data Structure Interface

The best designed collection framework I have personally worked with, is Scala's, especially combined with the extensions from the Scalaz library.

The Haskell community currently seems to be the frontrunner when it comes to figuring out very general collection interfaces, see e.g. the Data.Traversable or Data.Foldable type classes, zippers, iteratees, enumeratees etc. (A lot of this is also implemented in Scalaz.)

Here's an example how to solve your problem in Scala. Note, I haven't done anything to make this efficient or beautiful, nor have I given any thought to whether this algorithm is sensible or not.

The important feature is that this code is only dependent on operations provided by the Seq trait (zipWithIndex, sorted, foldRight, filterNot), which is very general: it only assumes that the collection has a definite size and that elements may appear multiple times. That's it.

val s = Seq(5, 1, 9, 2, 6) // our test sequence

val sWithIndex = s zipWithIndex

val rejects = sWithIndex.sorted.foldRight(
  (Int.MaxValue, (
    (Int.MinValue, Int.MinValue), (Int.MinValue, Int.MinValue)))) {
  case ((el, i), (minDist, ((n1, i1), (n2, i2)))) => 
    if(math.abs(n2 - el) <= minDist)
      (math.abs(n2 - el), ((n2, i2), (el, i)))
    else
      (minDist, ((n1, i1), (n2, i2)))
  }._2

val rejectIndices = (rejects._1._2, rejects._2._2)

val result = sWithIndex filterNot { case (_, i) =>
  i == rejectIndices._1 || i == rejectIndices._2 } map (_._1)
// result = List(1, 9, 2)

This algorithm is O(n*log n), because sorted is a generic stable sort. You could make it O(n) by using a radix sort, since the elements are known to be integers.

AST Representation – Homogeneous vs. Heterogeneous AST Representation

For me, the big advantage of the heterogeneous AST is that it forms a kind of forced, annotated switch statement (assuming a C-like language).

For the homogeneous AST you usually end up with some kind of routine or class with a big switch statement. You need to keep track of which child node is what yourself. "First child is the conditional, second the true-block, third the false-block." Whenever you change the code, you easily find yourself making a mental picture of your DSL syntax over and over again.

Of course you can document heavily, but a good program ought to be self-documenting as much as possible. The heterogeneous AST does just that.

Furthermore, you can easily turn a heterogeneous AST into a homogeneous one, but not the other way around. Add the tag info (which is a good idea, unless your language supports a cheap is-a query). You can add Node(int index) methods to return the named fields. So you lose nothing in generality by using the heterogeneous AST.

I won't mention that the heterogeneous AST is ideal for the Visitor pattern, as it is just as easy to use the Strategy pattern with the homogeneous switch routine. It is easier to add specific functionality to the heterogeneous AST itself, though. If you want to turn it into an interpreter, all you need to do is add some kind of "eval" methods.

I would consider a homogeneous AST if there are limiting circumstances. If you need to port the compiler to a system with no OOP language available, or if you need to optimize for speed. The homogeneous AST is easier to combine with an FSM. The latter can also be an advantage if you want to have a general multi-purpose compiler that loads syntax rules on the fly. But it is easier to start out with a heterogeneous AST that will generate those tables, after the compiler has been thoroughly tested.

So, all in all, I would say that neither tree offers specific advantages in terms of "does this tree help or hinder in, say, 'semantic passes'?" The advantage of the heterogeneous AST is, in my experience, to reduce the amount of thought and concentration you have to put into coding the tedious stuff of the compiler. There is a lot of repetitiveness and bookkeeping going on, so let the computer do the work for you as much as possible, is my motto.

Best Answer

Related Solutions

Data Structures – Standardized Data Structure Interface

AST Representation – Homogeneous vs. Heterogeneous AST Representation

Related Topic