11.4 Kruskal's Algorithm

Given a connected weighted undirected graph, find a spanning tree whose total edge weight is minimum.

11.4 Kruskal's Algorithm

Problem

Given a connected weighted undirected graph, find a spanning tree whose total edge weight is minimum.

A brute-force approach would generate every possible spanning tree and compute its cost. Unfortunately, the number of spanning trees grows exponentially with graph size, making exhaustive search impractical even for moderately sized graphs.

We need an efficient algorithm that constructs a minimum spanning tree directly.

Solution

Kruskal's algorithm is a greedy algorithm that repeatedly selects the smallest available edge that does not create a cycle.

The algorithm operates according to a simple principle:

Always choose the cheapest safe edge.

The cut property guarantees that these choices are safe, allowing a globally optimal solution to emerge from a sequence of local decisions.

Algorithm

  1. Sort all edges by weight.
  2. Start with an empty forest.
  3. Process edges from smallest to largest.
  4. Add an edge if it connects two different components.
  5. Skip the edge if it would create a cycle.
  6. Stop after selecting (n-1) edges.

The resulting graph is a minimum spanning tree.

Example

Consider:

      4
A -------- B
|          |
|          |
2          5
|          |
|          |
C -------- D
      3

Additional edge:

A -------- D
      1

Edge list:

Edge Weight
A-D 1
A-C 2
C-D 3
A-B 4
B-D 5

Step 1

Choose:

A-D = 1

Forest:

A ----- D

Step 2

Choose:

A-C = 2

Forest:

    C
    |
    |
A ----- D

Step 3

Consider:

C-D = 3

Adding it creates:

A → C → D → A

a cycle.

Reject it.

Step 4

Choose:

A-B = 4

Result:

    C
    |
    |
A ----- D
|
|
B

Selected edges:

(A,D)
(A,C)
(A,B)

Total weight:

$$ 1 + 2 + 4 = 7 $$

Since the graph has four vertices, the algorithm stops after selecting:

$$ 4 - 1 = 3 $$

edges.

Why It Works

The key observation is that each accepted edge is the minimum-weight edge crossing some cut.

Suppose the current forest contains:

Component 1
Component 2

The smallest edge connecting them crosses a cut separating those components.

By the cut property:

The minimum crossing edge is safe.

Therefore every accepted edge can appear in some minimum spanning tree.

Repeatedly applying this rule eventually constructs an entire minimum spanning tree.

Forest Interpretation

At the beginning:

A   B   C   D

Every vertex forms its own component.

As edges are added:

(A,D)

becomes:

A-D

Then:

(A,C)

becomes:

A-C-D

The algorithm gradually merges components until only one remains.

This perspective motivates the use of the union-find data structure.

Detecting Cycles Efficiently

A naïve cycle check might run DFS after every edge insertion.

This would be expensive.

Instead, Kruskal uses disjoint-set union (union-find).

For each vertex:

make_set(v)

creates a separate component.

When processing an edge:

(u,v)

check:

find(u)
find(v)

Different Components

find(u) != find(v)

Safe to add.

Merge:

union(u,v)

Same Component

find(u) == find(v)

Adding the edge creates a cycle.

Skip it.

This reduces cycle detection to nearly constant time.

Pseudocode

Kruskal(G):

    sort edges by weight

    for each vertex:
        make_set(vertex)

    mst = []

    for edge(u,v,w):

        if find(u) != find(v):

            mst.append(edge)

            union(u,v)

    return mst

This is the standard implementation used in practice.

Detailed Example

Graph:

A-B = 1
B-C = 2
A-C = 3
C-D = 4
B-D = 5

Sorted edges:

1
2
3
4
5

Process:

Edge A-B

Different components.

Accept.

Edge B-C

Different components.

Accept.

Edge A-C

Same component.

Reject.

Edge C-D

Different components.

Accept.

Tree:

A-B
|
C
|
D

Weight:

$$ 1 + 2 + 4 = 7 $$

Time Complexity

Sorting

Sorting dominates the running time.

For (E) edges:

$$ O(E \log E) $$

Union-Find Operations

With path compression and union by rank:

$$ O(E \alpha(V)) $$

where (\alpha) is the inverse Ackermann function.

In practice:

α(V) < 5

for every realistic graph.

Total

$$ O(E \log E) $$

This is usually written as:

$$ O(E \log E) $$

because sorting dominates.

Space Complexity

Store:

  • Edge list
  • Union-find structure

Memory:

$$ O(V + E) $$

Sparse vs Dense Graphs

Sparse Graph

E ≈ V

Complexity:

$$ O(V \log V) $$

Very efficient.

Dense Graph

E ≈ V^2

Complexity:

$$ O(V^2 \log V) $$

Still practical for many applications.

For very dense graphs, Prim's algorithm may perform better.

Common Mistakes

Forgetting to Sort

Kruskal requires global edge ordering.

Without sorting, correctness disappears.

Incorrect Cycle Detection

Checking only immediate neighbors is insufficient.

Cycles may involve many vertices.

Use union-find.

Continuing After n-1 Edges

Once:

$$ |MST| = V - 1 $$

the algorithm can terminate.

Using Directed Graphs

Kruskal is defined for undirected graphs.

Directed graphs require different optimization techniques.

Real-World Applications

Network Construction

Building communication links at minimum cost.

Road Planning

Connecting cities while minimizing infrastructure expenses.

Electrical Distribution

Designing transmission networks.

Clustering

Many clustering algorithms remove large MST edges to create groups.

Image Segmentation

Graph-based segmentation often begins with MST construction.

Correctness Summary

Kruskal succeeds because:

  1. Edges are processed from cheapest to most expensive.
  2. Every accepted edge is safe by the cut property.
  3. Every rejected edge would create a cycle.
  4. The cycle property guarantees such edges are unnecessary.
  5. After (n-1) accepted edges, the graph is a spanning tree.
  6. Every choice preserves optimality.

Together these facts prove that Kruskal produces a minimum spanning tree.

Recipe

When solving a minimum spanning tree problem:

  1. List all edges.
  2. Sort them by weight.
  3. Initialize union-find.
  4. Process edges from smallest to largest.
  5. Add edges connecting different components.
  6. Reject edges forming cycles.
  7. Stop after selecting (n-1) edges.

Kruskal's algorithm is one of the most important greedy algorithms in computer science. It combines the cut property, cycle property, sorting, and disjoint-set union into a remarkably simple method for constructing minimum spanning trees. The next section examines Prim's algorithm, which approaches the same problem from a completely different perspective.