9.12 Topological Sort
You need to arrange tasks in an order that respects dependencies.
9.12 Topological Sort
Problem
You need to arrange tasks in an order that respects dependencies.
Many real-world systems contain precedence constraints:
- A compiler must parse before type checking.
- Type checking must complete before code generation.
- A package manager must install dependencies before dependent packages.
- A build system must compile libraries before applications.
- A workflow engine must run upstream jobs before downstream jobs.
These problems share a common structure.
Some actions must occur before others.
You need a way to produce a valid ordering.
Solution
Model the problem as a Directed Acyclic Graph (DAG) and compute a topological ordering.
A topological ordering is a sequence of vertices such that every directed edge:
u → v
appears with:
u before v
in the ordering.
Consider:
Parse → TypeCheck → Compile → Test
A valid topological order is:
Parse
TypeCheck
Compile
Test
Every dependency appears before the task that requires it.
If a graph contains a cycle, no valid topological ordering exists.
Discussion
Topological sorting is one of the most important applications of directed graphs.
Unlike BFS or DFS, which answer reachability questions, topological sorting answers scheduling questions.
The algorithm does not find the shortest path.
It does not find connected components.
It finds an execution order.
Whenever you encounter language such as:
before
after
depends on
requires
prerequisite
precedence
a DAG and topological sorting should immediately come to mind.
Understanding the DAG Requirement
Topological sorting works only on Directed Acyclic Graphs.
Consider:
A → B
B → C
A valid ordering exists:
A B C
Now consider:
A → B
B → C
C → A
This graph contains a cycle.
The requirements become:
A before B
B before C
C before A
which is impossible.
No ordering can satisfy all constraints simultaneously.
For this reason, every topological-sorting algorithm either:
- assumes the graph is acyclic, or
- detects cycles and reports failure.
DFS-Based Topological Sort
One elegant solution uses DFS finish times.
The key observation is:
A vertex should appear only after all of its descendants have been processed.
Consider:
A → B → D
\\n → C
DFS explores:
A
B
D
then finishes:
D
before:
B
Then:
C
and finally:
A
The finish order becomes:
D B C A
Reversing it produces:
A C B D
which is a valid topological ordering.
Recipe: DFS Topological Sort
def topological_sort(graph):
visited = set()
order = []
def dfs(vertex):
visited.add(vertex)
for neighbor in graph[vertex]:
if neighbor not in visited:
dfs(neighbor)
order.append(vertex)
for vertex in graph:
if vertex not in visited:
dfs(vertex)
order.reverse()
return order
Example:
graph = {
"Parse": ["TypeCheck"],
"TypeCheck": ["Compile"],
"Compile": ["Test"],
"Test": [],
}
Output:
[
"Parse",
"TypeCheck",
"Compile",
"Test",
]
Every dependency constraint is respected.
Why Reverse Finish Order Works
Suppose:
A → B
DFS cannot finish A until B has finished.
Therefore:
finish(B)
before
finish(A)
The finish list contains:
B A
Reversing gives:
A B
which satisfies the dependency.
This argument generalizes to every edge in a DAG.
That is why reverse finish order yields a valid topological sort.
Kahn's Algorithm
Another popular approach uses in-degrees.
The key idea is simple:
A vertex with in-degree zero has no prerequisites.
It can execute immediately.
Consider:
A → C
B → C
Initially:
| Vertex | In-Degree |
|---|---|
| A | 0 |
| B | 0 |
| C | 2 |
Vertices:
A
B
can run immediately.
After removing them:
C
becomes eligible.
This process continues until all vertices are processed.
Recipe: Kahn's Algorithm
from collections import deque
def topological_sort(graph):
indegree = {
vertex: 0
for vertex in graph
}
for source in graph:
for target in graph[source]:
indegree[target] += 1
queue = deque()
for vertex in graph:
if indegree[vertex] == 0:
queue.append(vertex)
order = []
while queue:
vertex = queue.popleft()
order.append(vertex)
for neighbor in graph[vertex]:
indegree[neighbor] -= 1
if indegree[neighbor] == 0:
queue.append(neighbor)
return order
Example:
[
"Parse",
"TypeCheck",
"Compile",
"Test",
]
This algorithm is widely used in schedulers and build systems.
Detecting Cycles
Kahn's algorithm naturally detects cycles.
Suppose:
A → B
B → C
C → A
Every vertex has in-degree:
1
No vertex enters the queue.
The algorithm immediately stalls.
More generally:
if len(order) != len(graph):
raise ValueError(
"Cycle detected"
)
If some vertices remain unprocessed, a cycle must exist.
This makes Kahn's algorithm particularly attractive for dependency validation.
Multiple Valid Orders
A graph may have many valid topological orderings.
Consider:
A → D
B → D
C → D
Valid order:
A B C D
Also valid:
C A B D
Also valid:
B C A D
The only requirement is:
A before D
B before D
C before D
Algorithms may produce different outputs depending on:
- adjacency-list ordering
- queue ordering
- vertex insertion order
This behavior is expected.
Recipe: Dependency Validation
Suppose a package manager stores:
graph = {
"app": ["web", "db"],
"web": ["core"],
"db": ["core"],
"core": [],
}
A valid installation order:
core
web
db
app
can be obtained through topological sorting.
A simple validation step:
order = topological_sort(graph)
if len(order) != len(graph):
raise ValueError(
"Circular dependency"
)
prevents impossible installations.
Build systems, workflow engines, CI pipelines, and orchestration tools commonly perform exactly this check.
Topological Levels
Sometimes an ordering is insufficient.
You may want execution stages.
Consider:
A → D
B → D
C → E
Possible stages:
Level 0:
A B C
Level 1:
D E
These levels reveal tasks that can execute in parallel.
A modified Kahn algorithm can compute them naturally.
This idea forms the basis of many distributed scheduling systems.
Applications
Topological sorting appears in numerous domains.
Build Systems
Source
↓
Parse
↓
Compile
↓
Link
Package Managers
Library A
↓
Library B
↓
Application
Data Pipelines
Extract
↓
Transform
↓
Load
University Courses
Calculus I
↓
Calculus II
↓
Differential Equations
Workflow Engines
Validate
↓
Process
↓
Publish
The same algorithm solves all of them.
Complexity Analysis
Let:
V= number of verticesE= number of edges
DFS-based topological sorting:
| Operation | Complexity |
|---|---|
| DFS traversal | O(V + E) |
| Reverse output | O(V) |
| Total | O(V + E) |
Kahn's algorithm:
| Operation | Complexity |
|---|---|
| In-degree computation | O(V + E) |
| Queue processing | O(V + E) |
| Total | O(V + E) |
Both algorithms are linear in graph size.
Common Pitfalls
Do not attempt topological sorting on undirected graphs.
The concept applies only to directed dependency structures.
Do not ignore cycles.
A cycle means no valid ordering exists.
Do not assume the output is unique.
Multiple valid solutions are common.
Do not forget sink vertices during graph construction.
Do not confuse DFS visitation order with topological order.
Only reverse finish order has the required property.
Do not assume every DAG has a single starting vertex.
Many DAGs contain multiple independent sources.
Recipe: Verify an Ordering
A useful validation technique checks every edge.
def verify_order(graph, order):
position = {
vertex: i
for i, vertex
in enumerate(order)
}
for source in graph:
for target in graph[source]:
if position[source] > position[target]:
return False
return True
This is often useful in testing and debugging.
Takeaway
Topological sorting transforms a dependency graph into a valid execution order. It applies to Directed Acyclic Graphs and forms the foundation of build systems, package managers, workflow engines, scheduling systems, and many data-processing pipelines. Whether implemented through DFS finish times or Kahn's in-degree algorithm, topological sorting provides a simple and efficient mechanism for respecting precedence constraints while exposing opportunities for parallel execution.