11.20 Complexity Analysis

Minimum spanning tree algorithms are easy to state, but their performance depends heavily on representation and data structure choices.

11.20 Complexity Analysis

Problem

Minimum spanning tree algorithms are easy to state, but their performance depends heavily on representation and data structure choices.

The same graph problem can produce different costs depending on whether you use:

edge list
adjacency list
adjacency matrix
binary heap
union-find
sorting

A useful complexity analysis must therefore answer more than:

What is the Big O?

It should answer:

Which operation dominates?
Which representation is assumed?
Which graph density is expected?
What memory cost is acceptable?

This section gives a practical framework for analyzing MST algorithms.

Solution

Analyze MST algorithms in terms of:

V = number of vertices
E = number of edges

Then identify the dominant operation.

Algorithm Main operation Typical complexity
Kruskal Sort edges O(E log E)
Prim with binary heap Heap operations O(E log V)
Prim with adjacency matrix Array scans O(V²)
Borůvka Repeated edge scans O(E log V)

These bounds are not interchangeable. Each assumes a different implementation.

Kruskal

Kruskal performs three major steps:

  1. Sort edges by weight.
  2. Initialize union-find.
  3. Scan edges and merge components.

Sorting costs:

O(E log E)

Union-find operations cost:

O(E α(V))

where α is the inverse Ackermann function.

Since:

α(V)

is effectively constant in practice, sorting dominates.

So Kruskal is usually written as:

O(E log E)

Because a simple graph has at most:

E <= V²

we can also write:

O(E log V)

since:

log E <= 2 log V

Both forms are common.

Kruskal Space Cost

Kruskal stores:

edge list
union-find arrays
MST edges

Space:

O(E + V)

If the input is already an edge list, this is natural.

If the input is a dense matrix, converting it to an edge list may create avoidable memory pressure.

Prim with Binary Heap

Prim with an adjacency list and binary heap performs:

  1. One graph traversal through adjacency lists.
  2. Heap insertions or decrease-key operations.
  3. Visited checks.

With a standard binary heap:

O(E log V)

If using duplicate heap entries instead of decrease-key, the asymptotic bound remains:

O(E log E)

which is equivalent to:

O(E log V)

for simple graphs.

In practice, duplicate entries often produce simpler and faster code despite more heap pushes.

Prim Heap Space Cost

Store:

adjacency list
heap
visited array
best array
parent array

Space:

O(V + E)

The heap can contain more than V entries if duplicates are allowed.

In the worst case:

O(E)

heap space.

Prim with Adjacency Matrix

For dense graphs, matrix-based Prim avoids heap overhead.

Each iteration:

  1. Scans all vertices to find the unvisited vertex with smallest best.
  2. Scans all vertices again to update candidate edges.

There are:

V

iterations.

Total:

O(V²)

Space:

O(V²)

for the matrix, plus:

O(V)

for arrays.

This version is often the right choice when the graph is dense and already represented as a cost matrix.

Borůvka

Borůvka runs in phases.

Each phase scans all edges to find the cheapest outgoing edge for every component.

Per phase:

O(E)

The number of components decreases by at least a factor of two each phase.

Number of phases:

O(log V)

Total:

O(E log V)

Space:

O(E + V)

for edge storage, union-find, and cheapest-edge arrays.

Comparing Sparse Graphs

For sparse graphs:

E = O(V)

Complexities become:

Algorithm Sparse complexity
Kruskal O(V log V)
Prim with heap O(V log V)
Borůvka O(V log V)
Prim with matrix O(V²)

For sparse graphs, matrix Prim is usually wasteful.

Kruskal is often the simplest implementation.

Prim with heap is also excellent when adjacency lists already exist.

Comparing Dense Graphs

For dense graphs:

E = O(V²)

Complexities become:

Algorithm Dense complexity
Kruskal O(V² log V)
Prim with heap O(V² log V)
Borůvka O(V² log V)
Prim with matrix O(V²)

For dense graphs, matrix Prim is usually the preferred textbook choice.

Sorting Versus Heap Cost

Kruskal pays its main cost before it starts selecting MST edges:

sort all edges

Prim pays gradually:

heap push
heap pop
heap update

If the graph is sparse and edge weights are already sorted, Kruskal becomes very attractive.

If the graph is adjacency-list based and no edge list exists, Prim may avoid an extra conversion step.

Integer Weights

When edge weights are small integers, sorting can be improved.

Instead of comparison sorting:

O(E log E)

you may use:

counting sort
radix sort
bucket sort

This can reduce Kruskal's practical running time.

For example, if weights lie in:

0..65535

a counting-sort-like approach may be faster than comparison sorting.

Union-Find Cost

Optimized union-find with path compression and union by rank has amortized cost:

O(α(V))

per operation.

For MST algorithms, union-find rarely dominates runtime.

Still, poor DSU implementation can matter.

Naive union-find may degrade to:

O(V)

per operation, making Kruskal much slower.

Use optimized DSU by default.

Memory as a Complexity Constraint

Asymptotic time is not the only issue.

A graph with:

V = 1,000,000
E = 3,000,000

is sparse.

An adjacency list or edge list is plausible.

An adjacency matrix requires:

10¹²

entries, which is usually impossible.

Choosing O(V²) memory by accident can make an otherwise good algorithm unusable.

Input Representation Matters

Do not analyze the algorithm apart from the input.

Input form Natural algorithm
Edge list Kruskal
Adjacency list Prim with heap
Cost matrix Prim with array scans
Distributed edge partitions Borůvka-style hybrid
Already sorted edges Kruskal

The cost of conversion can be significant.

For example, matrix to edge list costs:

O(V²)

space and time before the MST algorithm even starts.

Disconnected Graphs

For disconnected graphs, MST algorithms return forests.

Kruskal:

accepts edges until no more useful edges remain

Prim:

must restart from each unvisited vertex

Borůvka:

stops when no component has outgoing edges

Complexity bounds remain similar, but the output contains fewer than:

V - 1

edges.

Practical Benchmarking

When implementations are close asymptotically, benchmark.

Use data that matches expected workloads:

random sparse graphs
road-like graphs
complete distance graphs
scale-free graphs
grid graphs

Different graph families stress different operations.

Road networks have low degree.

Social graphs can have skewed degree distribution.

Complete distance graphs are dense.

One benchmark rarely tells the full story.

Common Mistakes

Comparing Algorithms Without Representations

O(E log V) for Prim assumes adjacency lists and a heap. It is not the cost of every Prim implementation.

Ignoring Sorting

For Kruskal, sorting is usually the dominant cost.

Using Matrix Prim on Sparse Graphs

The O(V²) time and space are unnecessary.

Forgetting Heap Duplicates

Duplicate-entry Prim may use O(E) heap space.

Treating α(V) as Mysterious

For practical input sizes, inverse Ackermann is a tiny constant.

Recipe

When analyzing an MST implementation:

  1. Identify V and E.
  2. Classify the graph as sparse, dense, or unknown.
  3. Identify the input representation.
  4. List the dominant operation: sorting, heap operations, scans, or repeated edge passes.
  5. Include memory complexity.
  6. State whether the graph is connected.
  7. Account for edge weight type if sorting can be specialized.
  8. Benchmark if two choices are close.

Complexity analysis for MST algorithms is mainly about matching the graph to the implementation. Kruskal, Prim, and Borůvka are all efficient, but they are efficient in different regimes. The right answer depends on density, representation, memory limits, and the cost of the operations hidden behind the Big O notation.