C++ Optimization – Most Efficient Implementation of a Tree in C++

cdata structuresoptimization

I need to write a tree where each element may have any number of child elements, and because of this each branch of the tree may have any length.

The tree is only going to receive elements at first and then it is going to use exclusively for iterating though it's branches in no specific order.

The tree will have several million elements and must be fast but also memory efficient.

My plan makes a node class to store the elements and the pointers to its children. When the tree is fully constructed, it would be transformed it to an array or something faster and if possible, loaded to the processor's cache.

Construction and the search on the tree are two different problems. Can I focus on how to solve each problem on the best way individually? The construction of has to be as fast as possible but it can use memory as it pleases. Then the transformation into a format that give us speed when iterating the tree's branches. This should preferably be an array to avoid going back and forth from RAM to cache in each element of the tree.

So the real question is which is the structure to implement a tree to maximize insert speed, how can I transform it to a structure that gives me the best speed and memory?

Best Answer

A natural way to implement an updateable tree with arbitrary numbers of children per node is to reinterpret a binary tree such that the "left hand" link points to the node's first child, and the "right hand" link points to the next child of the same parent. This takes two links per node, and requires linear list traversal to locate a particular child. However, if the order of the children doesn't matter, you can simply insert each child at the head of the child list.

You can construct a read-only tree with only one link per node, by concatenating all children of a given node into a sub-vector, and dropping the right-hand link in favor of iteration. You will need to include either a boolean last_child flag or a child_count field as part of your read-only node structure; note that the child_count version will permit random access of child lists.

If your read-only queries frequently iterate through long child lists, this may dramatically improve your cache usage. Alternately, if your read-only queries frequently iterate in a depth-first manner, it may be more efficient to drop the left-hand link in favor of iteration, by concatenating all first-child chains into a sub-vector.

In either case, you can use an STL vector<> to do memory-management for the read-only tree -- by doing a traversal (in the appropriate order) of your updateable tree, and using push_back() to append the read-only version of each tree node, in order. Remember that you will need to use indices rather than pointers for your links, as pushing an element may reallocate the vector<>.

Finally, minimizing the size of your read-only node structure can improve your performance. If your read-only node includes a link to the original datastructure, you can profitably strip out any data not needed during a query's traversal but only used when the query finds its target. (Come to think of it, the updateable tree might also benefit from this.)