Tree data structure in Mathematica

wolfram-mathematica

I have used mathematica mostly as a mathematics workbench and for writing relatively small ad-hoc programs. I am however designing a system which I intend to program in Mathematica. I need to store data in a tree, and search and traverse the tree. Although I know how to implement a tree, I prefer standard, tested code. I looked at what sort of packages there are for basic datastructures at the Mathematica users wiki. I have found none, although there is a small example in the Mathematica documentation.

Now to my question(s):

  1. Is there an ( open source ) package for data structures available somewhere?

  2. What approach have you used with regard to data structures? Gradually developing your own util package?

( Not a question, just a remark. Maybe… the lack of ( lots of available ) open source packages is the reason why Mathematica doesn't have the momentum it deserves. A chicken / egg issue, I am afraid. )

Best Answer

In Mathematica, most of what you do is based on expressions. Expressions naturally have the tree structure. For depth-first traversals (which are probably most common), you can then use functions like Scan,Map, Cases etc. The difference w.r.t the more traditional languages is that there is no simple way to preserve the identity of individual node in an expression tree, since there are no pointers in Mathematica. Also, many operations on expressions that are idiomatic in Mathematica would copy the entire expression when you only need to modify it in a few places, because expressions are immutable.

Using immutable Mathematica expressions as trees still has several advantages. One is that, because they are immutable, it is easy to understand what they store by just looking at them (state and behavior are not mixed). Another is that there are efficient and generic functions such as Map, MapIndexed or Scan, that traverse them. For example, the visitor design pattern is invisible - it is just Map[f,tree,Infinity], built into the langauge. Also, there are built-in functions such as Cases, Replace, ReplaceAll, etc, which allow one to write very concise and declarative code to destructure trees, find pieces of trees with certain syntax or satisfying some condition, etc. Since trees are not restricted to only be built from lists and be built from different heads, one can effectively use this to write very concise tree-processing code. Finally, a freedom to very easily build any tree structure you want makes it much easier to perform experiments and prototype things, in the spirit of exploratory and bottom-up programming, which shortens the development cycle and ultimately leads to better designs.

That said, you can certainly implement "stateful" (mutable) tree data structure. The real reason it has not been done yet generally is, I suspect, the performance hit associated with building, modifying and traversing such a tree, since it will undergo a full symbolic evaluation process at every step (see this post for more details on that). For 2 examples of how, for example, a binary search tree can be used in Mathematica context for pretty efficient code, see my posts here (generic symbolic setting) and here (in the context of Compiled code). For general ways to construct data structures idiomatically in Mathematica, I recommend books of Roman Maeder: "Programming in Mathematica", "Mathematica programmer I&II", and especially "Computer Science in Mathematica". In the latter he has a detailed discussion of how to implement binary search tree in Mathematica. EDIT As @Simon mentioned, the talk of @Daniel Lichtblau is also a great resource, which shows how to build data structures and make them efficient.

Regarding general ways to implement data structures in Mathematica which would incorporate some state, here is a simple example extracted from my post in this Mathgroup thread - it implements a "pair" data structure.

Unprotect[pair, setFirst, getFirst, setSecond, getSecond, new, delete];
ClearAll[pair, setFirst, getFirst, setSecond, getSecond, new, delete];
Module[{first, second},
  first[_] := {};
  second[_] := {};
  pair /: new[pair[]] := pair[Unique[]];
  pair /: pair[tag_].delete[] := (first[tag] =.; second[tag] =.);
  pair /: pair[tag_].setFirst[value_] := first[tag] = value;
  pair /: pair[tag_].getFirst[] := first[tag];
  pair /: pair[tag_].setSecond[value_] := second[tag] = value;
  pair /: pair[tag_].getSecond[] := second[tag];
  Format[pair[x_Symbol]] := "pair[" <> ToString[Hash[x]] <> "]";
];
Protect[pair, setFirst, getFirst, setSecond, getSecond, new, delete]; 

Here is how you could use it:

pr = new[pair[]];
pr.setFirst[10];
pr.setSecond[20];
{pr.getFirst[], pr.getSecond[]}

{10, 20}

Creating a list of new pair objects :

pairs = Table[new[pair[]], {10}]

{"pair[430427975]", "pair[430428059]", "pair[430428060]", "pair[430428057]",
"pair[430428058]", "pair[430428063]", "pair[430428064]", "pair[430428061]", 
"pair[430428062]", "pair[430428051]"}

Setting the fields :

Module[{i},
 For[i = 1, i <= 10, i++,
  pairs[[i]].setFirst[10*i];
  pairs[[i]].setSecond[20*i];]]

Checking the fields :

#.getFirst[] & /@ pairs

{10, 20, 30, 40, 50, 60, 70, 80, 90, 100}

#.getSecond[] & /@ pairs

{20, 40, 60, 80, 100, 120, 140, 160, 180, 200} 

In the post I mentioned there is a more detailed discussion. One big problem for "objects" created in this way is that there is no automatic garbage collection for them, which may be one of the major reasons why OOP extensions implemented in top-level Mathematica itself did not really take off.

There are several OOP extensions for Mathematica, such as the classes.m package by Roman Maeder (the source is in his "Mathematica Programmer" book), the Objectica commercial package, and several others. But until Mathematica itself would provide efficient mechanisms (perhaps based on some kind of pointer or reference mechanism) for building mutable data structures (if this ever happens), there will probably be a large performance hit associated with top-level implementations of such data structures in mma. Also, since mma is based on immutability as one of the core ideas, it is not very easy to make mutable data structures fit well with other idioms of Mathematica programming.

EDIT

Here is a bare-bones stateful tree implementation along the lines of the example above:

Module[{parent, children, value},
  children[_] := {};
  value[_] := Null;
  node /: new[node[]] := node[Unique[]];
  node /: node[tag_].getChildren[] := children[tag];
  node /: node[tag_].addChild[child_node, index_] := 
        children[tag] = Insert[children[tag], child, index];
  node /: node[tag_].removeChild[index_] := 
        children[tag] = Delete[children[tag], index];
  node /: node[tag_].getChild[index_] := children[tag][[index]];
  node /: node[tag_].getValue[] := value[tag];
  node /: node[tag_].setValue[val_] := value[tag] = val;
];

Some examples of use:

In[68]:= root = new[node[]]

Out[68]= node[$7]

In[69]:= root.addChild[new[node[]], 1]

Out[69]= {node[$8]}

In[70]:= root.addChild[new[node[]], 2]

Out[70]= {node[$8], node[$9]}

In[71]:= root.getChild[1].addChild[new[node[]], 1]

Out[71]= {node[$10]}

In[72]:= root.getChild[1].getChild[1].setValue[10]

Out[72]= 10

In[73]:= root.getChild[1].getChild[1].getValue[]

Out[73]= 10

For one non-trivial example of use of this mutable tree data structure, see this post of mine. It also confronts this method with the one more heavily reusing Mathematica native data structures and functions, and illustrates well the points discussed at the beginning of this post.

Related Topic