11.24 Common Bugs
Minimum spanning tree code often fails for reasons that are easy to miss in clean textbook examples.
11.24 Common Bugs
Problem
Minimum spanning tree code often fails for reasons that are easy to miss in clean textbook examples.
The graph is disconnected. The edge list has parallel edges. The total cost overflows. The implementation forgets to store reverse edges. The priority queue contains stale entries. The union-find structure compares vertices instead of roots.
These bugs are common because MST algorithms combine several moving pieces:
graph representation
edge ordering
component tracking
tree reconstruction
cost accumulation
This section lists the most frequent failure modes and how to prevent them.
Bug: Assuming the Graph Is Connected
Many inputs are not connected.
Example:
A -- B
C -- D
There is no spanning tree for the whole graph.
Kruskal will return:
A-B
C-D
which is a forest, not a tree.
If the API expects an MST, check:
len(mstEdges) == V - 1
If this condition fails, report:
graph disconnected
or return a minimum spanning forest intentionally.
Bug: Forgetting Reverse Edges
For an undirected adjacency list, every edge must be stored twice.
Wrong:
adj[u].append((v, w))
Correct:
adj[u].append((v, w))
adj[v].append((u, w))
This bug is especially common in Prim implementations. The algorithm may appear to work on some graphs but fail when the missing direction is needed to reach a vertex.
Bug: Treating Directed Graphs as MST Inputs
Minimum spanning trees are defined for undirected graphs.
If the input is directed, you are solving a different problem. The directed analogue is usually a minimum arborescence problem, not an MST problem.
Do not silently ignore edge direction unless the problem explicitly says the graph should be treated as undirected.
Bug: Comparing Vertices Instead of Components
In Kruskal, this is wrong:
if u != v:
accept edge
The correct check is:
if find(u) != find(v):
accept edge
Two different vertices may already belong to the same component.
Accepting the edge would create a cycle.
Bug: Broken Union
A common incorrect union operation is:
parent[v] = u
This may attach a non-root to another non-root and corrupt the structure.
Use roots:
rootU = find(u)
rootV = find(v)
if rootU != rootV:
parent[rootV] = rootU
With union by size or rank, attach the smaller tree under the larger one.
Bug: Naive Union-Find on Large Inputs
Naive union-find can form chains:
1 -> 2 -> 3 -> 4 -> 5
Then find() becomes slow.
Use:
path compression
union by size or rank
by default.
This is not premature optimization. It is the standard implementation.
Bug: Sorting Edges Incorrectly
Kruskal requires nondecreasing edge weight order.
Wrong sort key:
sort by source vertex
Correct sort key:
sort by weight
For deterministic output, use:
(weight, edgeId)
but never replace weight ordering with endpoint ordering.
Bug: Assuming Edge Weights Are Positive
MST algorithms work with negative weights.
Example:
A-B = -10
B-C = 2
A-C = 5
The MST uses:
A-B
B-C
If the implementation rejects negative edges, it is wrong unless the problem domain explicitly forbids them.
Bug: Accepting Self-Loops
A self-loop:
A-A
does not connect two components.
In Kruskal:
find(A) == find(A)
so it should be rejected automatically.
In Prim or graph construction code, ignore self-loops unless the surrounding API needs to preserve them for other algorithms.
Bug: Mishandling Parallel Edges
Parallel edges connect the same endpoints.
Example:
A-B = 10
A-B = 1
The cheaper edge may belong to the MST.
Do not deduplicate edges by endpoint unless you keep the minimum weight.
If edge identity matters, assign unique IDs.
Bug: Integer Overflow in Total Cost
Edge selection can be correct while the total cost is wrong.
Example:
3 edges
each weight = 1,500,000,000
Total:
4,500,000,000
This exceeds signed 32-bit integer range.
Use a wider type for accumulated cost.
A common rule:
edge weight type may be int32
total cost type should often be int64
Bug: Incorrect Early Termination
Kruskal can stop when:
acceptedEdges == V - 1
for a connected graph.
But do not stop earlier based on:
processedEdges == V - 1
Processed edges include rejected edges.
Use accepted edge count.
Bug: Missing Visited Check in Prim
Priority queues often contain stale entries.
Example:
push(10, A)
push(3, A)
When the larger old entry is popped later, it must be ignored.
Correct pattern:
cost, v = heap.pop()
if visited[v]:
continue
visited[v] = true
Without this check, Prim may add a vertex twice and corrupt the MST.
Bug: Confusing Prim with Dijkstra
Prim chooses the cheapest edge connecting a new vertex to the tree.
Dijkstra chooses the smallest accumulated distance from the source.
Wrong for Prim:
newCost = dist[u] + weight(u, v)
Correct for Prim:
newCost = weight(u, v)
This is one of the most common implementation mistakes because the two algorithms look similar.
Bug: Parent Array Not Updated
In Prim, when a better edge is found:
best[v] = w
parent[v] = u
Both assignments matter.
If best is updated but parent is not, the final cost may be correct while the reconstructed edge list is wrong.
Bug: Treating Missing Matrix Edges as Zero
Dense graph inputs often use sentinels:
-1
infinity
null
for missing edges.
Do not treat these as zero.
A zero-weight edge is a real edge with cost zero.
A missing edge is not an edge.
Bug: Including Diagonal Matrix Entries
In an adjacency matrix:
weight[v][v]
is usually zero.
Those are self-loops.
Ignore them for MST construction.
Otherwise the algorithm may repeatedly select meaningless zero-cost edges.
Bug: Not Handling Single-Vertex Graphs
For:
V = 1
E = 0
the MST has:
0 edges
0 total cost
Do not return disconnected or error unless the problem explicitly disallows empty trees.
Bug: Failing on Empty Graphs
For:
V = 0
the correct behavior is API-dependent.
Some libraries return an empty forest.
Some reject the input.
Choose one behavior and document it.
Do not allow undefined behavior.
Bug: Ignoring Equal-Weight MSTs
Multiple MSTs may exist.
If tests require one exact edge set, they may fail even when the algorithm is correct.
For equal-weight graphs, test:
valid spanning tree
correct total cost
rather than exact edge identity.
Bug: Broken Borůvka Phases
In Borůvka, each component should select its cheapest outgoing edge based on the component structure at the start of the phase.
A common mistake is to merge components while still computing cheapest edges for the same phase.
Safer pattern:
scan all edges and record cheapest outgoing edge per component
then perform unions
Keep these two steps separate.
Bug: Rollback DSU with Path Compression
Rollback union-find needs to undo changes.
Path compression changes many parent pointers during find().
If those changes are not recorded, rollback becomes incorrect.
For rollback DSU, usually use:
union by size
no path compression
This gives predictable undo behavior.
Bug: Not Verifying Output
Even if the algorithm seems correct, validate the result.
For connected graphs:
edges = V - 1
no cycles
all vertices connected
For disconnected graphs:
forest has V - componentCount edges
no cycles
components match original graph connectivity
Validation catches many defects before they reach production.
Debugging Checklist
When MST output is wrong, inspect:
input edges
sorted edge order
accepted edges
rejected edges
union-find roots
visited array
parent array
total cost type
connectivity result
Do not debug only from the final cost.
The final cost is usually too little information.
Common Bug Table
| Symptom | Likely Cause |
|---|---|
| Too many edges returned | Cycle detection broken |
| Too few edges returned | Graph disconnected or traversal bug |
| Cost too small | Self-loop, missing-edge, or overflow bug |
| Cost too large | Sort order or update bug |
| Prim misses vertices | Missing reverse edges or disconnected graph |
| Kruskal accepts cycle | Bad union-find |
| Different valid MST edges | Equal weights, not necessarily a bug |
| Crash on small graph | Single-vertex or empty-graph handling |
Recipe
Use this checklist when hardening MST code:
- Verify graph direction assumptions.
- Store undirected edges correctly.
- Sort by weight for Kruskal.
- Use optimized union-find.
- Ignore self-loops.
- Preserve parallel edges or safely reduce them.
- Use a wide type for total cost.
- Check connectivity explicitly.
- Handle equal weights in tests.
- Validate the output structure.
Most MST bugs are preventable. The safest implementations make assumptions visible: connected or disconnected, directed or undirected, tree or forest, exact edge set or just cost. Once those contracts are explicit, the algorithms are straightforward to test and maintain.