TAOCP 2.3.3 Exercise 10
Algorithm `E` constructs a forest whose nodes are the elements $1,\ldots,n$.
Section 2.3.3: Other Representations of Trees
Exercise 10. [28] In the worst case, Algorithm E may take order n^2 steps to process n equivalences. Show how to modify the algorithm so that the worst case is not this bad.
Verified: yes
Solve time: 1m24s
Setup
Algorithm E constructs a forest whose nodes are the elements $1,\ldots,n$. Each tree represents one equivalence class. The PARENT field of a root contains $0$; every nonroot node points to its parent.
Exercise 8 asks how the resulting structure can be used to determine whether two elements are equivalent. Exercise 10 asks for a modification of Algorithm E whose worst-case running time is asymptotically better than order $n^2$ when processing $n$ equivalences.
The source of the quadratic behavior is that Algorithm E may repeatedly traverse long parent chains in order to find the roots of the trees that contain the elements being merged. The modification must prevent such chains from becoming excessively long.
Solution
Associate with each root $r$ an integer $\mathrm{SIZE}(r)$ equal to the number of nodes in its tree. Initially,
$$ PARENT(j)=0,\qquad \mathrm{SIZE}(j)=1 \qquad (1\le j\le n). $$
When an equivalence $j\equiv k$ is processed, first determine the roots $r_j$ and $r_k$ of the trees containing $j$ and $k$ by following PARENT links upward until a node with PARENT = 0 is reached.
If $r_j=r_k$, no action is required.
If $r_j\ne r_k$, compare $\mathrm{SIZE}(r_j)$ and $\mathrm{SIZE}(r_k)$. Suppose
$$ \mathrm{SIZE}(r_j)\le \mathrm{SIZE}(r_k). $$
Set
$$ PARENT(r_j)\leftarrow r_k, $$
and replace
$$ \mathrm{SIZE}(r_k) \leftarrow \mathrm{SIZE}(r_k)+\mathrm{SIZE}(r_j). $$
If instead $\mathrm{SIZE}(r_k)<\mathrm{SIZE}(r_j)$, interchange the roles of $r_j$ and $r_k$.
Thus the root of the smaller tree is always attached below the root of the larger tree.
To analyze the resulting structure, consider any node $x$. Each time the depth of $x$ increases by $1$, the root of the tree containing $x$ has been attached beneath another root. Since the smaller tree is attached to the larger, the size of the new tree is at least twice the size of the old tree containing $x$.
More precisely, if the depth of $x$ increases, and the old tree containing $x$ has size $s$, then the new tree has size at least
$$ s+s=2s. $$
After the depth of $x$ has increased $t$ times, the tree containing $x$ therefore has size at least
$$ 2^t. $$
Since no tree can contain more than $n$ nodes,
$$ 2^t\le n, $$
hence
$$ t\le \lfloor \log_2 n\rfloor . $$
The depth of every node is therefore at most $\lfloor\log_2 n\rfloor$.
Finding the root of any element requires traversing at most this many parent links. Consequently each equivalence can be processed in order $\log n$ time in the worst case. Processing $n$ equivalences therefore requires at most order
$$ n\log n $$
steps in the worst case.
This improves the worst-case bound from order $n^2$ to order $n\log n$.
Verification
The modification preserves correctness because each operation joins two trees only at their roots. No cycles can be introduced, and every tree continues to represent exactly one equivalence class.
The depth bound is obtained from the doubling argument. Whenever a node moves one level farther from the root, the size of its tree at least doubles. A node cannot belong to a tree larger than $n$ nodes. Therefore no node can acquire depth exceeding $\lfloor\log_2 n\rfloor$.
Since every root search follows a path from a node to its root, every such search examines at most $\lfloor\log_2 n\rfloor$ parent links. The worst-case cost of one merge is therefore order $\log n$, and the worst-case cost of $n$ merges is order $n\log n$.
Notes
A further improvement is possible by compressing parent paths during root searches. Combined with the size rule above, this yields a substantially smaller amortized cost, but the modification requested in this exercise is already sufficient to eliminate the quadratic worst-case behavior.