11.8 Path Compression
The basic union-find structure can become highly unbalanced.
11.8 Path Compression
Problem
The basic union-find structure can become highly unbalanced.
Consider the sequence:
union(A,B)
union(B,C)
union(C,D)
union(D,E)
union(E,F)
A naïve implementation may produce:
A
|
B
|
C
|
D
|
E
|
F
Now evaluate:
find(F)
The search follows:
F → E → D → C → B → A
Every operation traverses the entire chain.
For (n) vertices:
$$ O(n) $$
time may be required for a single find operation.
Large graphs can contain millions of vertices, making this behavior unacceptable.
We need a way to flatten these trees automatically.
Solution
Path compression shortens paths whenever a find operation is performed.
The idea is simple:
Every vertex visited during a find operation is attached directly to the root.
Instead of merely discovering the root, the algorithm restructures the tree.
Basic Example
Initial structure:
A
|
B
|
C
|
D
|
E
|
F
Execute:
find(F)
Traversal:
F → E → D → C → B → A
Root:
A
Without path compression:
A
|
B
|
C
|
D
|
E
|
F
remains unchanged.
With path compression:
A
/ | \ \\n B C D E
\\n F
or even:
A
/ /|\ \\n B C D E F
depending on implementation.
Every visited node now points directly to the root.
Future finds become dramatically faster.
Recursive Formulation
The standard implementation is surprisingly small:
find(x):
if parent[x] == x:
return x
parent[x] = find(parent[x])
return parent[x]
The critical line is:
parent[x] = find(parent[x])
After discovering the root, the algorithm rewrites the parent pointer.
This performs compression automatically.
Step-by-Step Example
Suppose:
A
|
B
|
C
|
D
Parent table:
| Vertex | Parent |
|---|---|
| A | A |
| B | A |
| C | B |
| D | C |
Execute:
find(D)
Recursive Calls
find(D)
find(C)
find(B)
find(A)
Root:
A
Compression Phase
Return from recursion:
parent[B] = A
parent[C] = A
parent[D] = A
Final structure:
A
/ | \\n B C D
Future searches require only one step.
Why It Works
Path compression never changes component membership.
Every node already belonged to the same component.
The optimization merely changes the internal representation.
Before:
D → C → B → A
After:
D → A
The root remains identical.
Therefore:
find(D)
returns the same result as before.
Correctness is preserved.
Multiple Finds
Suppose:
A
|
B
|
C
|
D
|
E
First Query
find(E)
Cost:
5 nodes
Tree becomes:
A
/ /|\ \\n B C D E
Second Query
find(E)
Cost:
2 nodes
Third Query
find(D)
Cost:
2 nodes
The structure improves itself over time.
Visualization
Before compression:
A
|
B
|
C
|
D
|
E
|
F
|
G
Depth:
6
After one find:
A
/ / / /|\ \\nB C D E F G
Depth:
1
This reduction is the source of the enormous performance gain.
Interaction with Kruskal
Consider processing many edges:
(u,v)
Every edge requires:
find(u)
find(v)
Without path compression:
long chains
long searches
With path compression:
almost flat trees
almost constant-time searches
This optimization is one reason Kruskal scales to graphs containing millions of edges.
Complexity Intuition
Without path compression:
$$ O(V) $$
may be required for a find operation.
With path compression:
the tree rapidly flattens.
Repeated operations become extremely cheap.
The theoretical analysis is subtle, but the practical effect is easy to observe:
Deep trees become shallow trees.
Amortized Analysis
Individual operations may still traverse several vertices.
However, once a path has been compressed, future operations become much cheaper.
The cost is therefore spread across many operations.
This is called amortized analysis.
Rather than measuring one operation, we measure a sequence of operations.
Over long sequences, path compression performs extraordinarily well.
Example Trace
Initial:
1
|
2
|
3
|
4
|
5
|
6
Perform:
find(6)
After compression:
1
/ / /|\ \\n2 3 4 5 6
Perform:
find(6)
find(5)
find(4)
find(3)
Each operation immediately reaches the root.
The expensive work never needs to be repeated.
Iterative Version
Some implementations avoid recursion.
Find Root
root = x
while parent[root] != root:
root = parent[root]
Compress Path
while x != root:
next = parent[x]
parent[x] = root
x = next
The result is identical.
Both recursive and iterative implementations are widely used.
Correctness Argument
Path compression preserves:
Component Membership
Every node continues to point to the same root.
Connectivity Information
Roots remain unchanged.
Set Partition
No component merges or splits occur.
Only internal parent pointers change.
Therefore all connectivity queries remain correct.
Common Mistakes
Compressing to Non-Roots
Wrong:
parent[x] = parent[parent[x]]
without reaching the root.
Correct compression always uses the true root.
Forgetting Assignment
Wrong:
find(parent[x])
Correct:
parent[x] = find(parent[x])
The assignment performs the compression.
Mixing Roots
Only vertices from the same component may be compressed together.
Compressing to a different component breaks correctness.
Assuming One Operation Flattens Everything
Only visited paths are compressed.
Unvisited parts of the tree remain unchanged.
Performance Impact
Path compression alone reduces practical running times dramatically.
For many real-world workloads:
find()
appears almost constant time.
Combined with union by rank, the result becomes one of the fastest known general-purpose data structures.
Applications Beyond MST
Path compression benefits every DSU application:
- Connected components
- Dynamic connectivity
- Clustering
- Image segmentation
- Network analysis
- Percolation models
- Offline graph queries
Anywhere union-find appears, path compression should usually be enabled.
Complexity
Path compression alone provides very strong amortized performance.
The exact theoretical bound becomes most interesting when combined with union by rank.
Together they achieve:
$$ O(\alpha(V)) $$
amortized time per operation, where (\alpha) is the inverse Ackermann function.
This function grows so slowly that:
α(V) < 5
for any graph that could exist in practice.
Recipe
When implementing union-find:
- Follow parent pointers until reaching a root.
- Record every visited vertex.
- Redirect those vertices directly to the root.
- Return the root.
- Allow future operations to benefit from the flattened structure.
Path compression transforms union-find from a simple connectivity structure into an extremely efficient one. The next section introduces union by rank, the second major optimization that prevents deep trees from forming in the first place. Together, these two techniques produce the near-constant-time DSU used in modern graph algorithms.