Today, I read a paper by Michael A. Bender & Martı́n Farach-Colton on one of the most fundamental algorithmic problem on trees, The LCA algorithm.
Here’s what I learned from it.
The Least Common Ancestor of two nodes A and B in a tree is a shared ancestor of A and B that is located farthest from the root. The LCA problem is a pretty interesting one because fast algorithms for the LCA problem can be used to solve other algorithmic problems. Harel and Tarjan showed that LCA queries can be answered in constant time by doing some linear preprocessing of the tree. Unfortunately, it is also well understood that the actual algorithm is too complicated to be implemented efficiently. The hypothesis made by algorithm designers is that the LCA problem still does not have an implementable optimal solution. Thus, concluding that it’s better to have a solution of LCA problem that doesn’t rely on preprocessing. This notion is been argued in the paper.
In this paper, the authors present the re-engineered PRAM algorithm, which might not be a promising solution at first. But, when PRAM complications are removed we are left with an extremely simple algorithm.
We start with RMQ , Range Minimum Query problem, which might seem pretty different from LCA but is closely linked. According to Wikipedia,
Range minimum query (RMQ) solves the problem of finding the minimal value in a sub-array of an array of comparable objects.
Now, we introduce some notations to simplify the description of algorithms. If an algorithm has preprocessing time f(n) & query time g(n), we will say that the algorithm has complexity if . Our solution to LCA is derived from RMQ so we reduce LCA problem to RMQ problem.
We establish a lemma that if there is time solution to RMQ, them there exist a time solution for LCA, where O(n) is the preprocessing cost, (2n - 1) is the length of the resultant array and O(1) is time needed to convert the RMQ answer to LCA answer.
Now, we create an array E , of (2n - 1) length doing a Euler Tour of the tree. Compute the Level array where L[i] is the depth of the node E[i]. Let the representative of a node in a Euler tour be the index of the first occurrence of the node in the tour. And compute the Representative Array R of length n, where R[i] is the index of the representative of node i. All this takes O(n) time.
We observe that,
- The nodes in the Euler Tour between the first visits to u and to v are E[R[u],…., R[v]].
- The shallowest node in this sub tour is at index RMQ(R[u], R[v]).
- The node at this position is E[RMQ(R[u], R[v])], which is thus the output of LCA(u, v).
Thus, we can complete our reduction in said time and space complexity of preprocessing the Level array L for RMQ.
From now, we will focus on RMQ and how to optimize it. RMQ’s preprocessing time can be reduced to O(n²) from O(n³) by dynamic programming.
We further reduce this brute-force algorithm to get the Sparse Table (ST) algorithm for RMQ which has a complexity of O( n log n).
In Level array L from the above reduction, adjacent elements differ by +1 & -1. This is because in a Euler tour, one element is always the parent of the other, and so their levels differ by exactly one. We consider this as a special (+1,-1) RMQ problem.
Now we use the ST algorithm on our special RMQ problem. We will use the table lookup technique to remove the O(log n) part from precomputing. We use 2 arrays along with the ST algorithm to reduce the running time to O(n).
This works for our special RMQ problem. In order to prove it works for general RMQ problems, we first reduce RMQ to LCA and then back to our special RMQ problem. We can easily do that in O(n) time and thus our algorithm works for all RMQ problems in O(n) time.