11.20 Complexity Analysis
Minimum spanning tree algorithms are easy to state, but their performance depends heavily on representation and data structure choices.
11.20 Complexity Analysis
Problem
Minimum spanning tree algorithms are easy to state, but their performance depends heavily on representation and data structure choices.
The same graph problem can produce different costs depending on whether you use:
edge list
adjacency list
adjacency matrix
binary heap
union-find
sorting
A useful complexity analysis must therefore answer more than:
What is the Big O?
It should answer:
Which operation dominates?
Which representation is assumed?
Which graph density is expected?
What memory cost is acceptable?
This section gives a practical framework for analyzing MST algorithms.
Solution
Analyze MST algorithms in terms of:
V = number of vertices
E = number of edges
Then identify the dominant operation.
| Algorithm | Main operation | Typical complexity |
|---|---|---|
| Kruskal | Sort edges | O(E log E) |
| Prim with binary heap | Heap operations | O(E log V) |
| Prim with adjacency matrix | Array scans | O(V²) |
| Borůvka | Repeated edge scans | O(E log V) |
These bounds are not interchangeable. Each assumes a different implementation.
Kruskal
Kruskal performs three major steps:
- Sort edges by weight.
- Initialize union-find.
- Scan edges and merge components.
Sorting costs:
O(E log E)
Union-find operations cost:
O(E α(V))
where α is the inverse Ackermann function.
Since:
α(V)
is effectively constant in practice, sorting dominates.
So Kruskal is usually written as:
O(E log E)
Because a simple graph has at most:
E <= V²
we can also write:
O(E log V)
since:
log E <= 2 log V
Both forms are common.
Kruskal Space Cost
Kruskal stores:
edge list
union-find arrays
MST edges
Space:
O(E + V)
If the input is already an edge list, this is natural.
If the input is a dense matrix, converting it to an edge list may create avoidable memory pressure.
Prim with Binary Heap
Prim with an adjacency list and binary heap performs:
- One graph traversal through adjacency lists.
- Heap insertions or decrease-key operations.
- Visited checks.
With a standard binary heap:
O(E log V)
If using duplicate heap entries instead of decrease-key, the asymptotic bound remains:
O(E log E)
which is equivalent to:
O(E log V)
for simple graphs.
In practice, duplicate entries often produce simpler and faster code despite more heap pushes.
Prim Heap Space Cost
Store:
adjacency list
heap
visited array
best array
parent array
Space:
O(V + E)
The heap can contain more than V entries if duplicates are allowed.
In the worst case:
O(E)
heap space.
Prim with Adjacency Matrix
For dense graphs, matrix-based Prim avoids heap overhead.
Each iteration:
- Scans all vertices to find the unvisited vertex with smallest
best. - Scans all vertices again to update candidate edges.
There are:
V
iterations.
Total:
O(V²)
Space:
O(V²)
for the matrix, plus:
O(V)
for arrays.
This version is often the right choice when the graph is dense and already represented as a cost matrix.
Borůvka
Borůvka runs in phases.
Each phase scans all edges to find the cheapest outgoing edge for every component.
Per phase:
O(E)
The number of components decreases by at least a factor of two each phase.
Number of phases:
O(log V)
Total:
O(E log V)
Space:
O(E + V)
for edge storage, union-find, and cheapest-edge arrays.
Comparing Sparse Graphs
For sparse graphs:
E = O(V)
Complexities become:
| Algorithm | Sparse complexity |
|---|---|
| Kruskal | O(V log V) |
| Prim with heap | O(V log V) |
| Borůvka | O(V log V) |
| Prim with matrix | O(V²) |
For sparse graphs, matrix Prim is usually wasteful.
Kruskal is often the simplest implementation.
Prim with heap is also excellent when adjacency lists already exist.
Comparing Dense Graphs
For dense graphs:
E = O(V²)
Complexities become:
| Algorithm | Dense complexity |
|---|---|
| Kruskal | O(V² log V) |
| Prim with heap | O(V² log V) |
| Borůvka | O(V² log V) |
| Prim with matrix | O(V²) |
For dense graphs, matrix Prim is usually the preferred textbook choice.
Sorting Versus Heap Cost
Kruskal pays its main cost before it starts selecting MST edges:
sort all edges
Prim pays gradually:
heap push
heap pop
heap update
If the graph is sparse and edge weights are already sorted, Kruskal becomes very attractive.
If the graph is adjacency-list based and no edge list exists, Prim may avoid an extra conversion step.
Integer Weights
When edge weights are small integers, sorting can be improved.
Instead of comparison sorting:
O(E log E)
you may use:
counting sort
radix sort
bucket sort
This can reduce Kruskal's practical running time.
For example, if weights lie in:
0..65535
a counting-sort-like approach may be faster than comparison sorting.
Union-Find Cost
Optimized union-find with path compression and union by rank has amortized cost:
O(α(V))
per operation.
For MST algorithms, union-find rarely dominates runtime.
Still, poor DSU implementation can matter.
Naive union-find may degrade to:
O(V)
per operation, making Kruskal much slower.
Use optimized DSU by default.
Memory as a Complexity Constraint
Asymptotic time is not the only issue.
A graph with:
V = 1,000,000
E = 3,000,000
is sparse.
An adjacency list or edge list is plausible.
An adjacency matrix requires:
10¹²
entries, which is usually impossible.
Choosing O(V²) memory by accident can make an otherwise good algorithm unusable.
Input Representation Matters
Do not analyze the algorithm apart from the input.
| Input form | Natural algorithm |
|---|---|
| Edge list | Kruskal |
| Adjacency list | Prim with heap |
| Cost matrix | Prim with array scans |
| Distributed edge partitions | Borůvka-style hybrid |
| Already sorted edges | Kruskal |
The cost of conversion can be significant.
For example, matrix to edge list costs:
O(V²)
space and time before the MST algorithm even starts.
Disconnected Graphs
For disconnected graphs, MST algorithms return forests.
Kruskal:
accepts edges until no more useful edges remain
Prim:
must restart from each unvisited vertex
Borůvka:
stops when no component has outgoing edges
Complexity bounds remain similar, but the output contains fewer than:
V - 1
edges.
Practical Benchmarking
When implementations are close asymptotically, benchmark.
Use data that matches expected workloads:
random sparse graphs
road-like graphs
complete distance graphs
scale-free graphs
grid graphs
Different graph families stress different operations.
Road networks have low degree.
Social graphs can have skewed degree distribution.
Complete distance graphs are dense.
One benchmark rarely tells the full story.
Common Mistakes
Comparing Algorithms Without Representations
O(E log V) for Prim assumes adjacency lists and a heap. It is not the cost of every Prim implementation.
Ignoring Sorting
For Kruskal, sorting is usually the dominant cost.
Using Matrix Prim on Sparse Graphs
The O(V²) time and space are unnecessary.
Forgetting Heap Duplicates
Duplicate-entry Prim may use O(E) heap space.
Treating α(V) as Mysterious
For practical input sizes, inverse Ackermann is a tiny constant.
Recipe
When analyzing an MST implementation:
- Identify
VandE. - Classify the graph as sparse, dense, or unknown.
- Identify the input representation.
- List the dominant operation: sorting, heap operations, scans, or repeated edge passes.
- Include memory complexity.
- State whether the graph is connected.
- Account for edge weight type if sorting can be specialized.
- Benchmark if two choices are close.
Complexity analysis for MST algorithms is mainly about matching the graph to the implementation. Kruskal, Prim, and Borůvka are all efficient, but they are efficient in different regimes. The right answer depends on density, representation, memory limits, and the cost of the operations hidden behind the Big O notation.