11.8 Path Compression

The basic union-find structure can become highly unbalanced.

11.8 Path Compression

Problem

The basic union-find structure can become highly unbalanced.

Consider the sequence:

union(A,B)
union(B,C)
union(C,D)
union(D,E)
union(E,F)

A naïve implementation may produce:

A
|
B
|
C
|
D
|
E
|
F

Now evaluate:

find(F)

The search follows:

F → E → D → C → B → A

Every operation traverses the entire chain.

For (n) vertices:

$$ O(n) $$

time may be required for a single find operation.

Large graphs can contain millions of vertices, making this behavior unacceptable.

We need a way to flatten these trees automatically.

Solution

Path compression shortens paths whenever a find operation is performed.

The idea is simple:

Every vertex visited during a find operation is attached directly to the root.

Instead of merely discovering the root, the algorithm restructures the tree.

Basic Example

Initial structure:

A
|
B
|
C
|
D
|
E
|
F

Execute:

find(F)

Traversal:

F → E → D → C → B → A

Root:

A

Without path compression:

A
|
B
|
C
|
D
|
E
|
F

remains unchanged.

With path compression:

    A
  / | \ \\n B  C  D E
          \\n           F

or even:

      A
   / /|\ \\n  B C D E F

depending on implementation.

Every visited node now points directly to the root.

Future finds become dramatically faster.

Recursive Formulation

The standard implementation is surprisingly small:

find(x):

    if parent[x] == x:
        return x

    parent[x] = find(parent[x])

    return parent[x]

The critical line is:

parent[x] = find(parent[x])

After discovering the root, the algorithm rewrites the parent pointer.

This performs compression automatically.

Step-by-Step Example

Suppose:

A
|
B
|
C
|
D

Parent table:

Vertex Parent
A A
B A
C B
D C

Execute:

find(D)

Recursive Calls

find(D)
    find(C)
        find(B)
            find(A)

Root:

A

Compression Phase

Return from recursion:

parent[B] = A
parent[C] = A
parent[D] = A

Final structure:

      A
    / | \\n   B  C  D

Future searches require only one step.

Why It Works

Path compression never changes component membership.

Every node already belonged to the same component.

The optimization merely changes the internal representation.

Before:

D → C → B → A

After:

D → A

The root remains identical.

Therefore:

find(D)

returns the same result as before.

Correctness is preserved.

Multiple Finds

Suppose:

A
|
B
|
C
|
D
|
E

First Query

find(E)

Cost:

5 nodes

Tree becomes:

      A
   / /|\ \\n  B C D E

Second Query

find(E)

Cost:

2 nodes

Third Query

find(D)

Cost:

2 nodes

The structure improves itself over time.

Visualization

Before compression:

A
|
B
|
C
|
D
|
E
|
F
|
G

Depth:

6

After one find:

        A
 / / / /|\ \\nB C D E F G

Depth:

1

This reduction is the source of the enormous performance gain.

Interaction with Kruskal

Consider processing many edges:

(u,v)

Every edge requires:

find(u)
find(v)

Without path compression:

long chains
long searches

With path compression:

almost flat trees
almost constant-time searches

This optimization is one reason Kruskal scales to graphs containing millions of edges.

Complexity Intuition

Without path compression:

$$ O(V) $$

may be required for a find operation.

With path compression:

the tree rapidly flattens.

Repeated operations become extremely cheap.

The theoretical analysis is subtle, but the practical effect is easy to observe:

Deep trees become shallow trees.

Amortized Analysis

Individual operations may still traverse several vertices.

However, once a path has been compressed, future operations become much cheaper.

The cost is therefore spread across many operations.

This is called amortized analysis.

Rather than measuring one operation, we measure a sequence of operations.

Over long sequences, path compression performs extraordinarily well.

Example Trace

Initial:

1
|
2
|
3
|
4
|
5
|
6

Perform:

find(6)

After compression:

      1
 / / /|\ \\n2 3 4 5 6

Perform:

find(6)
find(5)
find(4)
find(3)

Each operation immediately reaches the root.

The expensive work never needs to be repeated.

Iterative Version

Some implementations avoid recursion.

Find Root

root = x

while parent[root] != root:
    root = parent[root]

Compress Path

while x != root:

    next = parent[x]
    parent[x] = root
    x = next

The result is identical.

Both recursive and iterative implementations are widely used.

Correctness Argument

Path compression preserves:

Component Membership

Every node continues to point to the same root.

Connectivity Information

Roots remain unchanged.

Set Partition

No component merges or splits occur.

Only internal parent pointers change.

Therefore all connectivity queries remain correct.

Common Mistakes

Compressing to Non-Roots

Wrong:

parent[x] = parent[parent[x]]

without reaching the root.

Correct compression always uses the true root.

Forgetting Assignment

Wrong:

find(parent[x])

Correct:

parent[x] = find(parent[x])

The assignment performs the compression.

Mixing Roots

Only vertices from the same component may be compressed together.

Compressing to a different component breaks correctness.

Assuming One Operation Flattens Everything

Only visited paths are compressed.

Unvisited parts of the tree remain unchanged.

Performance Impact

Path compression alone reduces practical running times dramatically.

For many real-world workloads:

find()

appears almost constant time.

Combined with union by rank, the result becomes one of the fastest known general-purpose data structures.

Applications Beyond MST

Path compression benefits every DSU application:

  • Connected components
  • Dynamic connectivity
  • Clustering
  • Image segmentation
  • Network analysis
  • Percolation models
  • Offline graph queries

Anywhere union-find appears, path compression should usually be enabled.

Complexity

Path compression alone provides very strong amortized performance.

The exact theoretical bound becomes most interesting when combined with union by rank.

Together they achieve:

$$ O(\alpha(V)) $$

amortized time per operation, where (\alpha) is the inverse Ackermann function.

This function grows so slowly that:

α(V) < 5

for any graph that could exist in practice.

Recipe

When implementing union-find:

  1. Follow parent pointers until reaching a root.
  2. Record every visited vertex.
  3. Redirect those vertices directly to the root.
  4. Return the root.
  5. Allow future operations to benefit from the flattened structure.

Path compression transforms union-find from a simple connectivity structure into an extremely efficient one. The next section introduces union by rank, the second major optimization that prevents deep trees from forming in the first place. Together, these two techniques produce the near-constant-time DSU used in modern graph algorithms.