Implementing a Forest Data Structure in Java

trees

Name of data structure that's tree-like with multiple root nodes

I stumbled upon someone's answer to a question regarding implementing a tree that has > 2 nodes above. I just wanted to get someone's thoughts on using a Forest Data Structure to implement a Family Genealogy Tree that consist of multiple nodes with 0-multiple children. I may use a Forest Tree, however from most representations that I searched up on and found, it looks similar to a disjointed set, but I do not want a parent node to already be pre-destined to have no children.
I hope it makes sense what I am saying.
Any advice or comments you could offer would be much appreciated.

Best Answer

I hope I have a clear enough understanding of what your problem is. If I'm off-track let me know, otherwise the following should be of some help to you.

Just to be clear: I will consider all of the below under the context of genealogy.

A tree is simply a branching structure. In genealogy, you could consider the latest child as the root and create a tree that describes the ancestry of this person. Yes, the latest child is the root, not the oldest known ancestor. This is quite counter-intuitive and probably not really what you want anyways.

Now, once you have more complicated family issues, like a man having children with two different women, you'll run into problems with trees. If you have a separate tree for each child, then the father needs to be duplicated to be able to occur in each of these trees. Unpractical to say the least.

As Neil mentioned in his comment above, it is time to switch to a graph model. Most important for this is to clear up a typical confusion: The general populace speaks of something called a "Family genealogy tree", but as a computer scientist you must realize that such a tree is no tree as per the definitions given in computer science (or biology for that matter). We have a very clear and concise mathematical definition of a tree data structure, and a "genealogy tree" does not match it.

Why is that? Simply because in computer science trees, the different branches are not able to grow back together. With families, this is possible. Quite drastically, you could have the case of a father with two childrens, which again spawn an off-spring. Moral implications aside, this results in a diamond-like view, which is not representable in a tree.

Enter the world of graphs: A graph is just a generic term of things that are connected to each other. Hence, every computer science tree is also a graph. A very special and restricted one of course. In a graph, there is no root anymore, but there are nodes with similar properties (called sources). There are also no leaves anymore, but again nodes with similar properties (called sinks).

In terms of genealogy you can limit the result graph by making it a DAG, a directed acyclic graph. It is directed, because between two nodes, you can declare one as the parent side, one as the child side. For example, you could always point from parent to child (that's one direction, the other is feasible too). If you want to be able to traverse this structure in both directions, you may even consider an undirected variant.

Acyclicity simply means, that there are no circles in your data structure. It is pretty clear that this applies to genealogy (unless you want to count in time traveling and the implied option of becoming your own father or mother). In a DAG, you can have any number of nodes and connections between them. You can however add arbitrary limitations. For example, if all connections go from parent to child, then each child node could be restricted to have a maximum fan-in of 2. That's just graph theory lingo for saying that a person cannot have more than two parents.

Hence, I suggest to take a close look at directed acyclic graphs and the underlying graph theory in general. Also note that it'll pay off. If you want to do software development you should know about graphs. They are one of the most general data structures and apply to almost everything. They are not always the best choice of course, but they will be one of the most powerful things in your arsenal.