Parallel Processing Trees on GPU – Techniques and Algorithms

algorithmsgpuparallelismtrees

I have seen a few papers on parallel/GPU processing of trees, but after briefly looking through them I wasn't able to grasp what they did. The closest to a helpful explanation was found in Parallelization: Binary Tree Traversal in this diagram:

enter image description here

But having a difficult time following the paper.

Wondering if one could outline an algorithm for parallel processing of a tree. Somehow I can imagine this being possible, and seeing papers on it suggests it is, but I can't really think of what you would do to make it happen.

If it's any help, specifically I'm wondering how to traverse a B+tree to find the matches.

Update

Here is another diagram (from here) which seems to shed some light, but having difficulty understanding.

enter image description here

Best Answer

It's very simple.

When you would recurse down the left child then the right child, instead start a task that recurses down the left, and continue down the right. When you are done with the right, you may have to wait on the left, but in a balanced tree that won't be long

When you are at depth N in your tree, you have the potential for 2^N cores working

That specific diagram is only spawning tasks at specific depths, probably because it isn't queueing tasks. That is using knowledge of the structure of the tree to not overload the scheduler

If you have a task queue and a thread pool, the implementation can leverage those to not have to worry about the quantity of tasks spawned.

Say you have

class Node 
{
    int data; // or w/e

    Node * left;
    Node * right;

    template <typname Action>
    void SerialTraverse(Action action)
    {
        action(data); // Pre-order traversal

        if (left) left->SerialTraverse(action);
          // Traverse the left
        if (right) right->SerialTraverse(action);
          // Traverse the right
    }
}

namespace library 
{
    template<typename Task, typename Class, typename Arg>
    std::future<void> enqueue_task(Task task, Class * obj, Arg arg);
    // on some other thread, call "obj->task(arg);"
    // returns a handle upon which we can wait
}

Then you can change SerialTraverse to

    template <typname Action>
    void ParallelTraverse(Action action)
    {
        action(data);
        std::future<void> fut;
        if (left) fut = library::enqueue_task(ParallelTraverse, left, action);
          // start traversing left on another thread

        if (right) right->ParallelTraverse(action);
          // traverse right on this thread, in parallel to traversing left

        if (fut.valid()) fut.wait();
          // wait for the left traversal to end
    }
Related Topic