Does the Direction of a Huffman Tree Child Node Matter?

algorithm-analysisalgorithmshuffman encoding

So, I'm on my quest about creating a Java implementation of Huffman's algorithm for compressing/decompressing files (as you might know, ever since Why create a Huffman tree per character instead of a Node?) for a school assignment.

I now have a better understanding of how is this thing supposed to work. Wikipedia has a great-looking algorithm here that seemed to make my life way easier. Taken from http://en.wikipedia.org/wiki/Huffman_coding:

  • Create a leaf node for each symbol and add it to the priority queue.
  • While there is more than one node in the queue:
    • Remove the two nodes of highest priority (lowest probability) from the queue
    • Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes' probabilities.
    • Add the new node to the queue.
  • The remaining node is the root node and the tree is complete.

It looks simple and great. However, it left me wondering: when I "merge" two nodes (make them children of a new internal node), does it even matter what direction (left or right) will each node be afterwards?

I still don't fully understand Huffman coding, and I'm not very sure if there is a criteria used to tell whether a node should go to the right or to the left. I assumed that, perhaps the highest-frequency node would go to the right, but I've seen some Huffman trees in the web that don't seem to follow such criteria.

For instance, Wikipedia's example image http://upload.wikimedia.org/wikipedia/commons/thumb/8/82/Huffman_tree_2.svg/625px-Huffman_tree_2.svg.png seems to put the highest ones to the right. But other images like this one http://thalia.spec.gmu.edu/~pparis/classes/notes_101/img25.gif has them all to the left. However, they're never mixed up in the same image (some to the right and others to the left).

So, does it matter? Why?

Best Answer

Left or right shouldn't matter. It's the up or down-ness of the nodes in the tree that determines how many bits are needed to encode them. I asked this same question on coursera a few weeks ago and Matthew Brown provided the same response.