Huffman Encoding – Reconstructing a Huffman Tree with Minimal Information

chuffman encoding

I'm writing a huffman encoding program in C. I'm trying to include the least amount of information in the header as possible, I know the simplest way to decompress the file in the header would be to store the frequencies of each character in the file, but for a large file with 256 characters it would take 2304 bytes ((1 byte for character + 8 bytes for long frequency) * 256), which I don't think is optimal.
I know I can reconstruct a tree from a preorder scan and an inorder scan of it, but that requires having no duplicate values. That is bad because I now have to store each node in the tree (in a huffman tree: n*2 - 1 with n being the number of unique characters), twice, having each node be a long value (which could take ((256*2 - 1) * 2) * 8 = 8176 bytes.

Is there a way I'm missing here, or are those my only options?

Thanks.

Best Answer

There are 2 separate problems, store the topography and assign the leaf nodes

Assigning the leaf nodes can be done by storing the characters in in a predefined order so it can be extracted as needed.

Storing topography can be done by having a bit vector with 2 bits per parent node in the previous layer where 1 represents a compound node and 0 represents a leaf node

so first there is 1 bit for the root which is 1 and the next 2 bits will represent the next level down

to build the tree using the node{char value; node* left, right;} setup will be:

char[] chars;//prefill with the other array
int charIndex = 0;

node root;
vector<node*> toBuild(root);

while(!toBuild.empty()){
    node n = toBuild.popFront();
    bool bit = grabBit();
    if(bit){
        n.left = new node;
        toBuild.pushBack(n.left);
    }else
        n.value = chars[charIndex++];
    bit = grabBit();
    if(bit){
        n.right = new node;
        toBuild.pushBack(n.left);
    }else
        n.value = chars[charIndex++];
}
return root;

This is 2*n bits in the topography plus the permutation which is O(log n!) at the minimum.