Finding xor sum in a tree

algorithmstrees

I was trying to solve this problem on hackerearth where you have a tree with values on each node and you have to answer queries that ask what is the xor sum between two nodes. Also, there are updates that you have to make(the value of a node can change). I solved the easy part (dfs and lca) and for each node I saved the xor sum of the values of nodes on the path from node 1 to that node (including that node).

However, there's a part of the "editorial" for this problem I can't understand.

Consider the tree is rooted at node 1. Let F(i) be the XOR sum of all
value associated with nodes in the path from root 1 to node i. Let
LCA(u, v) be the lowest common ancestor of node u and node v. Let Q(u,
v) be the XOR sum of all value associated with nodes in the single
path from node u to node v. So Q(u, v) is answer for each query of
type 2.

We have: Q(u, v) = F(u) XOR F(v) XOR A[LCA(u, v)] (why?) where A[i]
denoting value associated with node i.

I don't understand where they got this formula:

Q(u, v) = F(u) XOR F(v) XOR A[LCA(u, v)]

What is the thought process behind this formula?

Best Answer

The key to deriving that formula is seeing how the various paths overlap and remembering that any number XOR itself equals 0. In particular, Q(x, y) XOR Q(x, y) will also equal 0.

The Quick Derivation

F(u) is the path from the root node to u. F(v) is the path from the root node to v. Because any number XOR itself equals 0, F(u) XOR F(v) will "cancel out" all the nodes above the lowest common ancestor of u and v, LCA(u, v). So F(u) XOR F(v) = Q(LCA(u, v), u) XOR Q(LCA(u, v), v). This happens to be the XOR sum of all nodes on the unique path from u to v, except for LCA(u, v) itself, since it's currently counted twice. So if we simply count it a third time, then we get the sum of all the nodes in Q(u, v). Thus Q(u, v) = F(u) XOR F(v) XOR LCA(u, v).

I suspect this "proof" leaves out too many details for you or contains a few steps you're not already convinced of, so here's a much longer version:

The Thorough Derivation

We're assuming that our graph is a valid tree, and there is a unique single path between u and v. This ensures u and v have a single "lowest common ancestor", LCA(u, v), which I will refer to as a. I will call the root node r. Keep in mind that a may be the same node as u, v, a, or even all three. The argument I'm about to show should still work even in those degenerate cases.

So there are unique paths (possibly of length zero) between these four nodes, like this:

Following the "editorial", let Q(x, y) denote the XOR sum of the nodes on the unique path from x to y.

We're interested in Q(u, v). Clearly, the unique path from u to v passes through a. However, it would be wrong to say that Q(u, v) = Q(u, a) XOR Q(a, v) because that would count the value of a twice. Fortunately, because XOR'ing a value by itself results in zero, counting a three times is the same as counting it only once. So the correct statement is:

Q(u, v) = Q(u, a) XOR Q(a, v) XOR A[a]

Next, F(i) is defined as Q(r, i). So we can follow the same logic as above and say:

F(u) = Q(r, u) = Q(r, a) XOR Q(a, u) XOR A[a]
F(v) = Q(r, v) = Q(r, a) XOR Q(a, v) XOR A[a]

Now, what is F(u) XOR F(v)? Because the XOR operation is commutative and associative, we can safely rearrange a series of XORs like we would a series of +s, which lets us do this:

F(u) XOR F(v) = (Q(r, a) XOR Q(a, u) XOR A[a]) XOR (Q(r, a) XOR Q(a, v) XOR A[a])
              = (Q(r, a) XOR Q(r, a)) XOR (Q(a, u) XOR Q(a, v)) XOR (A[a] XOR A[a])
              = 0 XOR (Q(a, u) XOR Q(a, v)) XOR 0
              = Q(a, u) XOR Q(a, v)

More intuitively: Whenever you XOR the XOR sums of two overlapping paths, the parts that overlap simply disappear, and we're left only with the XOR sums of the non-overlapping parts of the paths.

Finally, recall that Q(u, v) = Q(u, a) XOR Q(a, v) XOR A[a]. We now know that Q(a, u) XOR Q(a, v) is equal to F(u) XOR F(v), which means:

Q(u, v) = Q(u, a) XOR Q(a, v) XOR A[a]
        = F(u) XOR F(v) XOR A[a]

And since I defined a as LCA(u, v), we're done.

Best Answer

The Quick Derivation

The Thorough Derivation

Related Solutions

Does the Direction of a Huffman Tree Child Node Matter?

Huffman Encoding – How to Discriminate Between Two Nodes with Identical Frequencies

Related Topic