11.7 Union-Find
Many graph algorithms repeatedly ask the same question: > Do these two vertices already belong to the same connected component?
11.7 Union-Find
Problem
Many graph algorithms repeatedly ask the same question:
Do these two vertices already belong to the same connected component?
Consider Kruskal's algorithm.
Before adding an edge:
(u,v)
we must determine whether the edge creates a cycle.
A naïve solution performs DFS or BFS every time an edge is considered.
For a graph with millions of edges, repeatedly traversing the graph becomes prohibitively expensive.
We need a data structure that supports:
- Determining which component a vertex belongs to.
- Merging two components efficiently.
The union-find data structure provides exactly these operations.
It is one of the most important data structures in graph algorithms.
Solution
Union-find is also known as the Disjoint Set Union (DSU) data structure.
It maintains a collection of disjoint sets and supports two primary operations:
Find
Determine which set contains a given element.
find(x)
Union
Merge two sets.
union(a,b)
Together, these operations allow efficient connectivity tracking.
Initial Structure
Suppose we have:
A B C D E
Initially every vertex forms its own component:
A
B
C
D
E
Represent this using a parent array:
| Vertex | Parent |
|---|---|
| A | A |
| B | B |
| C | C |
| D | D |
| E | E |
Every element is initially its own parent.
Such vertices are called roots.
Tree Representation
Union-find stores each component as a tree.
Example:
A
|
B
|
C
Parent table:
| Vertex | Parent |
|---|---|
| A | A |
| B | A |
| C | B |
The root:
A
identifies the component.
Every vertex in the same tree belongs to the same set.
Find Operation
The find operation follows parent pointers until reaching a root.
Example:
A
|
B
|
C
Evaluate:
find(C)
Traversal:
C → B → A
Result:
A
Therefore:
find(C) = A
and:
find(B) = A
Both vertices belong to the same component.
Detecting Connectivity
Two vertices are connected when:
find(u) == find(v)
Example:
A
|
B
|
C
Then:
find(B) = A
find(C) = A
Since both roots are equal:
B and C are connected
This simple comparison powers cycle detection in Kruskal's algorithm.
Union Operation
Suppose we have:
A
and:
B
Separate components.
Execute:
union(A,B)
One root becomes the parent of the other:
A
|
B
Now:
find(A) = A
find(B) = A
Both vertices belong to the same set.
Example Sequence
Start:
A B C D
union(A,B)
A
|
B
C
D
union(C,D)
A
|
B
C
|
D
union(B,C)
Roots:
find(B)=A
find(C)=C
Merge:
A
|
B
|
C
|
D
Now all vertices belong to a single component.
Application to Kruskal
Consider:
A-B = 1
B-C = 2
A-C = 3
Process A-B
find(A) ≠ find(B)
Accept edge.
Union:
union(A,B)
Process B-C
find(B) ≠ find(C)
Accept edge.
Union:
union(B,C)
Process A-C
Now:
find(A) = A
find(C) = A
Same component.
Adding the edge creates a cycle.
Reject it.
Cycle detection required only two find operations.
No graph traversal was necessary.
Naïve Complexity
Suppose the structure becomes:
A
|
B
|
C
|
D
|
E
Evaluating:
find(E)
requires:
E → D → C → B → A
Five steps.
For (n) vertices:
$$ O(n) $$
per operation.
This is too slow for large graphs.
The next two sections introduce optimizations that transform union-find into one of the fastest known practical data structures.
Basic Implementation
Initialization
for each vertex:
parent[v] = v
Find
find(x):
while parent[x] != x:
x = parent[x]
return x
Union
union(a,b):
rootA = find(a)
rootB = find(b)
if rootA != rootB:
parent[rootB] = rootA
This is the simplest DSU implementation.
Correctness
The data structure maintains two invariants:
Invariant 1
Every component has exactly one root.
Invariant 2
Every vertex eventually reaches that root by following parent pointers.
Therefore:
find(u) == find(v)
if and only if:
u and v belong to the same component
This property makes connectivity queries extremely efficient.
Other Applications
Union-find appears far beyond minimum spanning trees.
Connected Components
Incrementally build graph connectivity.
Dynamic Connectivity
Track component merges.
Image Processing
Connected region detection.
Clustering
Hierarchical clustering algorithms.
Network Analysis
Component maintenance in evolving networks.
Percolation Simulation
Statistical physics models.
Social Networks
Community connectivity tracking.
Limitations
Union-find handles:
union
find
extremely well.
However it cannot efficiently answer:
shortest path
distance
edge weights
or general graph queries.
Its purpose is component management.
Nothing more.
Nothing less.
Common Mistakes
Comparing Vertices Instead of Roots
Wrong:
u == v
Correct:
find(u) == find(v)
Forgetting Find Before Union
Wrong:
parent[b] = a
Correct:
parent[find(b)] = find(a)
Assuming Tree Shape Matters
Only component membership matters.
Different tree structures can represent the same partition.
Ignoring Performance
The naïve implementation can degrade into long chains.
This issue motivates path compression and union by rank.
Complexity
Naïve implementation:
| Operation | Complexity |
|---|---|
| Find | O(V) |
| Union | O(V) |
| Space | O(V) |
These bounds improve dramatically in the next sections.
Recipe
When managing connected components:
- Create one set per vertex.
- Use
find()to determine component membership. - Use
union()to merge components. - Compare roots rather than vertices.
- Use union-find instead of repeated DFS connectivity checks.
Union-find is the engine behind Kruskal's algorithm and many connectivity algorithms. In the next section we introduce path compression, the first major optimization that transforms deep trees into nearly flat structures and dramatically improves performance.