B-Tree Benefits – Do Managed Languages Get B-Tree Benefits?

binary treedata structuresmanaged-code

My understanding is that one of the key features of a B-Tree (and a B+Tree) is that it is designed such that the size of its nodes are some multiple of the block size of whatever media the data is stored on.

Considering that, in a memory managed language like java/c#, we don't really have access to how, when and what order, data is accessed from the drive… can we still predictably benefit from the major advantage of this data structure?

Best Answer

Yes, B-Trees still make good sense in managed languages.

A few points of explanation:

If you're using the B-Tree as an on-disk data structure, then I can absolutely guarantee that disk IO will be your bottleneck, not the fact that you are using a managed language.
If you are using a B-Tree in memory, then you can still have considerable control over memory layout from a caching perspective. For example, you can use large arrays for data storage in Java/C# and store tree nodes/data in the arrays using offsets rather than having a separate object to represent each tree node.
The advantages of a data structure are largely independent of language, at least up to a constant % factor. So if a B-Tree makes sense for your algorithm / access pattern, it will probably do so regardless of what language you are using.
On top of all that, it is generally the case that Java/C# can be nearly as fast as C/C++ if well optimised.

Related Solutions

PHP Data Structures – Adding Ordered Nodes to Tree in Arbitrary Order

With the caveat that each node has exactly one parent except for the root node, calling this a tree is fine.

With respect to building the tree, one way is to do the following:

keep some kind of associative lookup of existing nodes indexed by their unique ID (I don't know PHP, but I'm thinking some kind of dictionary or hash)
keep a similar lookup of orphan nodes, indexed by parent ID
when you create a node:
1. see if it's parent is in the existing set
  - if it is, attach the child to the parent
  - if not, insert the new node in the orphan set
2. then see if any orphans are waiting for it (nodes in the orphan set indexed with the new node's ID)
  - if there are, attach them and remove them from the orphan set
3. finally, add the new node to the existing set
at the end, anything still in the orphan set is either a problem, or should be attached to your root node.

Assuming, of course, you have some unique ID to identify parent/child relationships.

Oh, and I seem to have used the word set to mean some kind of associative container - note that it can't be unique or 3.2 will have problems with multiple orphans of the same node.

Data structure: sort and search effectively

1. If you rarely add and remove data

What about using the same technique as the one used in RDBMS with indexes?

In other words, you'll have the unordered set containing the data, and four ordered sets containing the keys and the pointers to the items in the data set.

Of course, this may cause performance issues if you need to frequently add and remove lots of data.

2. If data is added or removed frequently

You can slightly modify the algorithm to reduce the performance impact of sorting the four index sets every time you add or remove an item. You may, for example, have four unordered index sets, create from them the sorted sets when needed, and invalidate those sorted sets when an element is added or removed.

3. Profile

Note that profiling is important, since you can't possibly guess where the bottleneck will be. Remember than:

When you remove an item from the data set, removing four keys from four index sets is fast, since those sets are already ordered;
When you add an item, adding four keys to the index sets is not hugely slow: you just have to walk through the sets, and insert the keys at the appropriate position:
Let the list be:
```
 3, 7, 8, 12, 16, 22, 23, 24, 27
```
If you need to add the value 25, position yourself at the middle of the list:
```
 3, 7, 8, 12, 16, 22, 23, 24, 27
              ↑
```
Since 25 is greater then 16, go to the right:
```
 -, -, -, --, --, 22, 23, 24, 27
                         ↑
```
And again to the right:
```
 -, -, -, --, --, --, --, 24, 27
                             ↑
```
Found the position.

Best Answer

Related Solutions

PHP Data Structures – Adding Ordered Nodes to Tree in Arbitrary Order

Data structure: sort and search effectively

Related Topic