Data Structures – Are Trees Organized by Firstchild, Nextsibling Structure?

data structures

Usually, tree data structures are organised in a way that each node contains pointers to all its children.

       +-----------------------------------------+
       |        root                             | 
       | child1            child2         child3 |
       +--+------------------+----------------+--+
          |                  |                |
+---------------+    +---------------+    +---------------+
|    node1      |    |     node2     |    |     node3     |
| child1 child2 |    | child1 child2 |    | child1 child2 |
+--+---------+--+    +--+---------+--+    +--+---------+--+
   |         |          |         |          |         |

This seems natural, but it comes with some problems. For example, when the number of child nodes varies, you need something like an array or list to manage the childs.

By using only (first)child and (next)sibling pointers instead, we get something that looks like that:

       +-------------------+
       |        root       |
       | child    sibling  +--->NULL
       +--+----------------+
          |             
+----------------+    +----------------+    +----------------+
|    node1       |    |     node2      |    |     node3      |
| child  sibling +--->| child  sibling +--->| child  sibling +--->NULL
+--+-------------+    +--+-------------+    +--+-------------+
   |                     |                     |

Oviously, this kind of structure can represent trees just as well, but it also offers some advantages. Most important is that we don't have to worry about the number of child nodes any more. When used for a parse tree, it offers a natural representation for a term like "a+b+c+d+e" without becoming a deep tree.

Do collection libraries offer tree structures like that? Do parsers use such a structure? If not, what are the reasons?

Best Answer

Trees, like lists, are "abstract data types" which can be implemented in different ways. Each way has it's advantages and disadvantages.

In the first example, the main advantage of this structure is that you can access any child in O(1). The disadvantage is that appending a child might sometimes be a little more expensive when the array has to be expanded. This cost is relatively small though. It is also one of the simplest implementation.

In the second example, the main advantage is that you always append a child in O(1). The main disadvantage is that random access to a child costs O(n). Also, it may be less interesting for huge trees for two reasons: it has a memory overhead of one object header and two pointers per node, and the nodes are randomly spread over memory which may cause a lot of swapping between the CPU cache and the memory when the tree is traversed, making this implementation less appealing for them. This is not a problem for normal trees and applications though.

One last interesting possibility which was not mentioned is to store the whole tree in a single array. This leads to more complex code, but is sometimes a very advantageous implementation in specific cases, especially for huge fixed trees, since you can spare the cost of the object header and allocate contiguous memory.