11.9 Union by Rank
Path compression makes existing trees flatter, but it does not prevent bad trees from being created.
11.9 Union by Rank
Problem
Path compression makes existing trees flatter, but it does not prevent bad trees from being created.
Consider the following sequence:
union(A,B)
union(B,C)
union(C,D)
union(D,E)
union(E,F)
A naïve implementation might repeatedly attach one root beneath another:
A
|
B
|
C
|
D
|
E
|
F
Path compression eventually fixes this structure, but the first few operations may still be expensive.
Can we avoid creating deep trees in the first place?
The answer is union by rank.
Solution
Union by rank modifies the union operation.
Instead of arbitrarily attaching one root beneath another, we always attach the smaller tree beneath the larger tree.
This prevents excessive depth growth.
The term rank is a measure of tree height.
For each root we maintain:
rank[root]
Initially:
rank = 0
for every vertex.
When merging two trees:
- Smaller rank becomes child of larger rank.
- Equal ranks require special handling.
This simple rule dramatically improves performance.
Initialization
Suppose we have:
A B C D
Parent table:
| Vertex | Parent |
|---|---|
| A | A |
| B | B |
| C | C |
| D | D |
Rank table:
| Vertex | Rank |
|---|---|
| A | 0 |
| B | 0 |
| C | 0 |
| D | 0 |
Every vertex starts as a single-node tree.
Merging Unequal Ranks
Suppose:
A
|
B
Rank:
A = 1
and:
C
Rank:
C = 0
Execute:
union(A,C)
Attach smaller rank beneath larger rank:
A
/ \\n B C
Ranks remain:
rank(A) = 1
No height increase occurs.
Merging Equal Ranks
Suppose:
A
and:
B
Both have:
rank = 0
Execute:
union(A,B)
Choose one root arbitrarily:
A
|
B
Now increase root rank:
rank(A) = 1
because tree height has grown.
This is the only situation where rank increases.
Example Sequence
Start:
A B C D
union(A,B)
A
|
B
Ranks:
A = 1
union(C,D)
C
|
D
Ranks:
C = 1
union(A,C)
Equal ranks:
1 and 1
Attach:
A
/ \\n B C
|
D
Increase:
rank(A) = 2
Final height:
2
Without union by rank, height could have become much larger.
Why It Works
Tree height only increases when two trees of equal rank merge.
This event is relatively rare.
To reach:
rank = 1
a tree needs at least:
2
vertices.
To reach:
rank = 2
it needs at least:
4
vertices.
To reach:
rank = 3
it needs at least:
8
vertices.
To reach:
rank = k
it needs at least:
$$ 2^k $$
vertices.
Therefore:
$$ k \le \log_2 V $$
The depth remains logarithmic.
Size-Based Variant
Many implementations use union by size instead of rank.
Maintain:
size[root]
representing the number of vertices.
When merging:
smaller tree
becomes child of:
larger tree
Example:
size(A)=10
size(B)=3
After:
union(A,B)
attach:
B beneath A
The effect is very similar to union by rank.
Both approaches achieve excellent performance.
Pseudocode
Initialization
for each vertex:
parent[v] = v
rank[v] = 0
Union
union(a,b):
rootA = find(a)
rootB = find(b)
if rootA == rootB:
return
if rank[rootA] < rank[rootB]:
parent[rootA] = rootB
else if rank[rootA] > rank[rootB]:
parent[rootB] = rootA
else:
parent[rootB] = rootA
rank[rootA]++
This is the standard textbook implementation.
Example from Kruskal
Consider:
A-B
B-C
C-D
D-E
Without union by rank:
A
|
B
|
C
|
D
|
E
Depth:
4
With union by rank:
A
/ | \\n B C D
|
E
Depth:
2
Future find operations become much faster.
Interaction with Path Compression
Union by rank prevents deep trees.
Path compression flattens existing trees.
Together:
union by rank
+
path compression
create an extraordinarily efficient structure.
Union by rank is proactive.
Path compression is reactive.
The combination is significantly stronger than either optimization alone.
Visual Comparison
Naïve Union
1
|
2
|
3
|
4
|
5
|
6
|
7
Depth:
6
Union by Rank
1
/ | \\n 2 3 4
/ \\n 5 6
\\n 7
Depth:
3
After Path Compression
1
/ / /|\ \\n 2 3 4 5 6 7
Depth:
1
This demonstrates the power of combining both optimizations.
Theoretical Analysis
Union by rank alone:
$$ O(\log V) $$
per operation.
Path compression alone:
Very strong amortized performance.
Combined:
$$ O(\alpha(V)) $$
amortized.
The inverse Ackermann function grows so slowly that:
| Vertices | α(V) |
|---|---|
| 10 | 3 |
| 1,000 | 4 |
| 1,000,000 | 4 |
| (10^{80}) | 4 |
| Astronomically large values | < 5 |
For practical purposes:
Almost constant time
is a reasonable description.
Why Rank Does Not Equal Height
After path compression:
rank
no longer accurately reflects actual height.
Example:
Before compression:
A
|
B
|
C
|
D
Height:
3
After compression:
A
/ | \\n B C D
Height:
1
Rank remains unchanged.
Therefore rank is only a heuristic used for balancing.
It is not necessarily the current tree height.
Common Mistakes
Increasing Rank Incorrectly
Ranks increase only when equal-rank trees merge.
Many incorrect implementations increment rank every union.
Ignoring Roots
Always compare:
find(a)
find(b)
not the original vertices.
Mixing Rank and Size
Use one strategy consistently.
Do not partially combine them.
Forgetting Path Compression
Union by rank alone is good.
Union by rank with path compression is substantially better.
Applications
Every high-performance DSU implementation uses:
- Path compression
- Union by rank or size
Applications include:
- Kruskal's algorithm
- Dynamic connectivity
- Offline graph queries
- Image segmentation
- Clustering
- Network analysis
- Percolation simulation
Complexity Summary
| Operation | Naïve DSU | Rank + Compression |
|---|---|---|
| Find | O(V) | O(α(V)) |
| Union | O(V) | O(α(V)) |
| Space | O(V) | O(V) |
This is one of the strongest asymptotic improvements in practical algorithm design.
Recipe
When implementing production-quality union-find:
- Store a parent array.
- Store a rank or size array.
- Use path compression inside
find(). - Attach smaller-rank trees beneath larger-rank trees.
- Increase rank only when equal-rank trees merge.
- Always operate on roots.
- Expect near-constant amortized performance.
With path compression and union by rank, union-find becomes the foundational connectivity structure used throughout graph algorithms. The next section applies this optimized DSU to a powerful application: answering connectivity queries efficiently in offline graph-processing problems.