11.7 Union-Find

Many graph algorithms repeatedly ask the same question: > Do these two vertices already belong to the same connected component?

11.7 Union-Find

Problem

Many graph algorithms repeatedly ask the same question:

Do these two vertices already belong to the same connected component?

Consider Kruskal's algorithm.

Before adding an edge:

(u,v)

we must determine whether the edge creates a cycle.

A naïve solution performs DFS or BFS every time an edge is considered.

For a graph with millions of edges, repeatedly traversing the graph becomes prohibitively expensive.

We need a data structure that supports:

  1. Determining which component a vertex belongs to.
  2. Merging two components efficiently.

The union-find data structure provides exactly these operations.

It is one of the most important data structures in graph algorithms.

Solution

Union-find is also known as the Disjoint Set Union (DSU) data structure.

It maintains a collection of disjoint sets and supports two primary operations:

Find

Determine which set contains a given element.

find(x)

Union

Merge two sets.

union(a,b)

Together, these operations allow efficient connectivity tracking.

Initial Structure

Suppose we have:

A B C D E

Initially every vertex forms its own component:

A
B
C
D
E

Represent this using a parent array:

Vertex Parent
A A
B B
C C
D D
E E

Every element is initially its own parent.

Such vertices are called roots.

Tree Representation

Union-find stores each component as a tree.

Example:

A
|
B
|
C

Parent table:

Vertex Parent
A A
B A
C B

The root:

A

identifies the component.

Every vertex in the same tree belongs to the same set.

Find Operation

The find operation follows parent pointers until reaching a root.

Example:

A
|
B
|
C

Evaluate:

find(C)

Traversal:

C → B → A

Result:

A

Therefore:

find(C) = A

and:

find(B) = A

Both vertices belong to the same component.

Detecting Connectivity

Two vertices are connected when:

find(u) == find(v)

Example:

A
|
B
|
C

Then:

find(B) = A
find(C) = A

Since both roots are equal:

B and C are connected

This simple comparison powers cycle detection in Kruskal's algorithm.

Union Operation

Suppose we have:

A

and:

B

Separate components.

Execute:

union(A,B)

One root becomes the parent of the other:

A
|
B

Now:

find(A) = A
find(B) = A

Both vertices belong to the same set.

Example Sequence

Start:

A B C D

union(A,B)

A
|
B

C

D

union(C,D)

A
|
B

C
|
D

union(B,C)

Roots:

find(B)=A
find(C)=C

Merge:

A
|
B
|
C
|
D

Now all vertices belong to a single component.

Application to Kruskal

Consider:

A-B = 1
B-C = 2
A-C = 3

Process A-B

find(A) ≠ find(B)

Accept edge.

Union:

union(A,B)

Process B-C

find(B) ≠ find(C)

Accept edge.

Union:

union(B,C)

Process A-C

Now:

find(A) = A
find(C) = A

Same component.

Adding the edge creates a cycle.

Reject it.

Cycle detection required only two find operations.

No graph traversal was necessary.

Suppose the structure becomes:

A
|
B
|
C
|
D
|
E

Evaluating:

find(E)

requires:

E → D → C → B → A

Five steps.

For (n) vertices:

$$ O(n) $$

per operation.

This is too slow for large graphs.

The next two sections introduce optimizations that transform union-find into one of the fastest known practical data structures.

Basic Implementation

Initialization

for each vertex:
    parent[v] = v

Find

find(x):

    while parent[x] != x:
        x = parent[x]

    return x

Union

union(a,b):

    rootA = find(a)
    rootB = find(b)

    if rootA != rootB:
        parent[rootB] = rootA

This is the simplest DSU implementation.

Correctness

The data structure maintains two invariants:

Invariant 1

Every component has exactly one root.

Invariant 2

Every vertex eventually reaches that root by following parent pointers.

Therefore:

find(u) == find(v)

if and only if:

u and v belong to the same component

This property makes connectivity queries extremely efficient.

Other Applications

Union-find appears far beyond minimum spanning trees.

Connected Components

Incrementally build graph connectivity.

Dynamic Connectivity

Track component merges.

Image Processing

Connected region detection.

Clustering

Hierarchical clustering algorithms.

Network Analysis

Component maintenance in evolving networks.

Percolation Simulation

Statistical physics models.

Social Networks

Community connectivity tracking.

Limitations

Union-find handles:

union
find

extremely well.

However it cannot efficiently answer:

shortest path
distance
edge weights

or general graph queries.

Its purpose is component management.

Nothing more.

Nothing less.

Common Mistakes

Comparing Vertices Instead of Roots

Wrong:

u == v

Correct:

find(u) == find(v)

Forgetting Find Before Union

Wrong:

parent[b] = a

Correct:

parent[find(b)] = find(a)

Assuming Tree Shape Matters

Only component membership matters.

Different tree structures can represent the same partition.

Ignoring Performance

The naïve implementation can degrade into long chains.

This issue motivates path compression and union by rank.

Complexity

Naïve implementation:

Operation Complexity
Find O(V)
Union O(V)
Space O(V)

These bounds improve dramatically in the next sections.

Recipe

When managing connected components:

  1. Create one set per vertex.
  2. Use find() to determine component membership.
  3. Use union() to merge components.
  4. Compare roots rather than vertices.
  5. Use union-find instead of repeated DFS connectivity checks.

Union-find is the engine behind Kruskal's algorithm and many connectivity algorithms. In the next section we introduce path compression, the first major optimization that transforms deep trees into nearly flat structures and dramatically improves performance.