Data Structures – Standardized Data Structure Interface

data structuresdesigndesign-patterns

I want to work with a variety of data structures (arrays, singly/doubly linked lists, sorted structures, etc.) on a plug-and-play basis. For example, I want to be able to easily swap in and out the sorted list and the array (which is re-sorted after every insertion/deletion), to test which one has better performance.

In every language I know of, the public interface to such data structures is very inconsistent. So changing the implementation requires a lot of work.

Is there a language, or a language-agnostic design pattern, that makes it easy to define a generic public interface that works with any data structure?

Of course, I understand how different performance would be (e.g., binary search in a sorted array vs linear search in a regular array). All I ask is that the performance of the data structure doesn't degrade asymptotically or by a large constant factor due to the use of the standard interface.

Example problem: iterate through the container of numbers, find the two numbers that are nearest each other, and remove the pair of them. (If several pairs are equal distance, pick an arbitrary pair).

I'd like the code to solve the problem to be unchanged as I switch in and out different containers.

Best Answer

The best designed collection framework I have personally worked with, is Scala's, especially combined with the extensions from the Scalaz library.

The Haskell community currently seems to be the frontrunner when it comes to figuring out very general collection interfaces, see e.g. the Data.Traversable or Data.Foldable type classes, zippers, iteratees, enumeratees etc. (A lot of this is also implemented in Scalaz.)

Here's an example how to solve your problem in Scala. Note, I haven't done anything to make this efficient or beautiful, nor have I given any thought to whether this algorithm is sensible or not.

The important feature is that this code is only dependent on operations provided by the Seq trait (zipWithIndex, sorted, foldRight, filterNot), which is very general: it only assumes that the collection has a definite size and that elements may appear multiple times. That's it.

val s = Seq(5, 1, 9, 2, 6) // our test sequence

val sWithIndex = s zipWithIndex

val rejects = sWithIndex.sorted.foldRight(
  (Int.MaxValue, (
    (Int.MinValue, Int.MinValue), (Int.MinValue, Int.MinValue)))) {
  case ((el, i), (minDist, ((n1, i1), (n2, i2)))) => 
    if(math.abs(n2 - el) <= minDist)
      (math.abs(n2 - el), ((n2, i2), (el, i)))
    else
      (minDist, ((n1, i1), (n2, i2)))
  }._2

val rejectIndices = (rejects._1._2, rejects._2._2)

val result = sWithIndex filterNot { case (_, i) =>
  i == rejectIndices._1 || i == rejectIndices._2 } map (_._1)
// result = List(1, 9, 2)

This algorithm is O(n*log n), because sorted is a generic stable sort. You could make it O(n) by using a radix sort, since the elements are known to be integers.

Related Topic