Data structures and algorithms for a directed rooted tree with inherited properties

algorithmsdata structuresgraphtrees

I need to represent a directed rooted tree in memory. The caveat is, that nodes have properties.

And those properties are inherited (but not 100%) by the child nodes, recursively.

What would be a good data structure and algorithms for storing this in memory?

For the purposes of notation:

D(V) – depth of the vertex (distance from the root).
A(V) – arity of the vertex (how many children it has).
V – Size of the tree (how many total vertices)
P – average # of properties inherited by a vertex.

The operations that need to be supported (efficiently) are:

Modifying the tree (add node/subtree; remove node/subtree, move node/subtree)

Tree modifications occur not too frequently, perhaps 1-3% of the tree changes throughout the day.
Change (add/delete/modify) a property on a node, and have all its descendents inherit that property according to inheritance rules:
- a property is inherited by default.
- A property itself has 2 attributes: "propagate" and "turned off".
- A property is inherited from parent to child if (and only if): Parent node's property has "propagate" flag on AND the child node does NOT have the same property set with "turned off" attribute.
Query out properties of a given node (this is a very frequent query, massively dominating other access patterns by frequency)
Efficiently query out properties of a set of nodes

This means that, if we query out properties of N nodes sharing same parent, at depth D, the complexity would be O((N+D)P) instead of O(DN*P)

This query is frequent but significantly less so than the single-node one. BUT, the problem is that it would frequently be run on the entire tree or a very large portion of it, and the bulk speedup matters due to V being very large.
Efficiently query out all nodes (in whole tree or given subtree) that have a specific property (either set directly or inherited) that is not turned off.

This means that it should be much faster than O(V*P).

Access pattern: less frequent than the other two, BUT needs to be very fast because it's called by user-facing UI and thus for user experience purposes needs to be responsive.

I have tried implementing 3 approaches, but all of them suffer from different issues due to the seemingly irreconcilable trade-offs:

Approach 1:
- Store the tree as a regular tree (possibly with enhancements to speed things up, such as doubly-linked edges to facilitate walking up the tree, and a hashtable to be able to find a given node in the tree in O(1).
- Store properties with the nodes they are set on
- Compute the properties inherited by nodes, by walking UP the tree from the node to find the path from root to the node, then walking back down that path and seeing which properties would propagate from root to child, then to grandchild… and so on till the node.
- PROBLEM: Computing properties on a set of nodes is slow. Dumb implementation results in O(PDN) although I have a feeling this can be improved.
- PROBLEM: Finding out which nodes inherit a specific property in the whole tree is VERY slow and I don't see a good way to optimize it.
  
  This implementation results in O(V*P) complexity, which is unacceptable.
Approach 2:
- Store the tree as a regular tree (possibly with enhancements to speed things up, such as doubly-linked edges to facilitate walking up the tree, and a hashtable to be able to find a given node in the tree in O(1).
- Store properties with the nodes they are set on, AND separately store (cache) the inherited properties of each node.
- This solution makes finding properties inherited by a node O(P) and a set of nodes O(P*N) very efficient.
- PROBLEM: Tree manipulation is slow. Basically, you need to re-compute O(PD + PN) (where D is a depth of a node where change occurred and N is amount of nodes being added/moved)
- PROBLEM: Finding out which nodes inherit a specific property in the whole tree is VERY slow and I don't see a good way to optimize it.
  
  This implementation results in O(V*P) complexity, where P is average # of properties inherited on a node and V is # of vertices on the whole tree.
Approach 3:
- Store the tree as a regular tree (possibly with enhancements to speed things up, such as doubly-linked edges to facilitate walking up the tree, and a hashtable to be able to find a given node in the tree in O(1).
- Store properties with the nodes they are set on, AND separately store (cache) the inherited properties of each node.
- Also, have a separate cache linking properties to a list of nodes they are inherited in (call it "property cache").
- This solution makes finding properties inherited by a node O(P) and a set of nodes O(P*N) very efficient.
- PROBLEM: Tree manipulation is extremely slow. Basically, you need to re-compute O(PD + PN) (where D is a depth of a node where change occurred and N is amount of nodes being added/moved) AND then also spend time refreshing properties cache that can also be O(P*N) since you have to rebuild entire cache.
- PROBLEM: Changing properties is also very slow because you also need to rebuild the entire properties cache for a changed property.

Best Answer

A regular tree would suit just fine and Approach 1 would work here. I think you should also read up a bit about the search algorithms used for a B-tree and then see how you could modify it here.

To me, this question is basically about how to modify a generalized B-Tree to handle properties with a propagation element.

Like you say:

Store the tree as a regular tree.
Store properties with the nodes they are set on.

But:

Don't double link them - You can walk down the tree during the search (and also unfold if necessary revisiting the parent node when using typical recursive methods).
Don't create a hash-table - a hash-table is just another look-up data-structure, why use it when you can modify the tree!
DON'T DO THIS: Compute the properties inherited by nodes, by walking UP the tree ...
INSTEAD DO THIS: Store the properties as you walk through the tree locating the target node. You have to visit them anyway. During the node traversal search just pass through the property if 'propagate' is on. When at the child node; don't save it if the property also exists there and is set to 'turned off'. Keep traversing until you reach your target node, storing the properties as you go (perhaps in some simple little list).
Note that the property attributes requirements for 'propagate' and 'turned off' only impact the search algorithm by requiring logic to be placed in the traversal method.

PROBLEM 1: Computing properties on a set of nodes is slow. Dumb implementation results in O(PDN) although I have a feeling this can be improved.

SOLUTION: This is full of assumptions. What is a set? I only can assume a simple case like; set of nodes are all of the direct children of their parent node & depth of 1. Given this; Locate the parent node first with all the properties saved up until that parent node (as we just discussed in the previous point). Then for each child of that node you calculate out the properties - so for each child you have all the saved properties up to that parent node + the child's properties.
EXAMPLE: Traverse from the root node to a target node. For each node get the properties where they are 'propagated' and then traverse the next node passing through those properties. For the next node; for each property passed through check they are in the current node, if they are present and are 'turned on' then overwrite the property with the current nodes value. Then do the 'propagation' check and redo going through the nodes until the target node. Then you are left with the children - apply the same attribute logic and record the properties for each node.

PROBLEM 2: Finding out which nodes inherit a specific property in the whole tree is VERY slow and I don't see a good way to optimize it.

SOLUTION: Here, you want to search for this efficiently but the creation of your tree solution doesn't have to be so efficient. And it's not too complex because you just need to find if it is 'turned off' or not - and not involve the 'propagate' property. What you need to do here is use the tree as it is supposed to be used. If a property is not 'turned off' you want to say to the parent node that a child of theirs has a specific property and then update all the way back up to the root node. This way if a property doesn't exist at all you'll know straight away when visiting the root node that it doesn't exist.
EXAMPLE: Given a child node is set with a new property and 'turned on'; update the parent node with the same property and set to 'turned off' and use this to say the property exists in one of the nodes children. Then for each parent above that parent, set the same property and attribute. Now start to traverse from the root node the property will be set on the root node indicating that a descendant has that property. Do a traversal search and go through each child where the property is set - store each node where the property is 'turned on'.

Hopefully I can improve on this answer. Help me by indicating hard to understand parts of my answer.

Best Answer

Related Solutions

C++ Data Structures – Modeling a Network of Nodes and Edges

Binary Trees – Difference Between Sibling Lists, Left-Child Right-Sibling Binary Tree, and Doubly-Chained Tree

Related Topic