9.12 Topological Sort

You need to arrange tasks in an order that respects dependencies.

9.12 Topological Sort

Problem

You need to arrange tasks in an order that respects dependencies.

Many real-world systems contain precedence constraints:

  • A compiler must parse before type checking.
  • Type checking must complete before code generation.
  • A package manager must install dependencies before dependent packages.
  • A build system must compile libraries before applications.
  • A workflow engine must run upstream jobs before downstream jobs.

These problems share a common structure.

Some actions must occur before others.

You need a way to produce a valid ordering.

Solution

Model the problem as a Directed Acyclic Graph (DAG) and compute a topological ordering.

A topological ordering is a sequence of vertices such that every directed edge:

u → v

appears with:

u before v

in the ordering.

Consider:

Parse → TypeCheck → Compile → Test

A valid topological order is:

Parse
TypeCheck
Compile
Test

Every dependency appears before the task that requires it.

If a graph contains a cycle, no valid topological ordering exists.

Discussion

Topological sorting is one of the most important applications of directed graphs.

Unlike BFS or DFS, which answer reachability questions, topological sorting answers scheduling questions.

The algorithm does not find the shortest path.

It does not find connected components.

It finds an execution order.

Whenever you encounter language such as:

before
after
depends on
requires
prerequisite
precedence

a DAG and topological sorting should immediately come to mind.

Understanding the DAG Requirement

Topological sorting works only on Directed Acyclic Graphs.

Consider:

A → B
B → C

A valid ordering exists:

A B C

Now consider:

A → B
B → C
C → A

This graph contains a cycle.

The requirements become:

A before B
B before C
C before A

which is impossible.

No ordering can satisfy all constraints simultaneously.

For this reason, every topological-sorting algorithm either:

  • assumes the graph is acyclic, or
  • detects cycles and reports failure.

DFS-Based Topological Sort

One elegant solution uses DFS finish times.

The key observation is:

A vertex should appear only after all of its descendants have been processed.

Consider:

A → B → D
 \\n  → C

DFS explores:

A
B
D

then finishes:

D

before:

B

Then:

C

and finally:

A

The finish order becomes:

D B C A

Reversing it produces:

A C B D

which is a valid topological ordering.

Recipe: DFS Topological Sort

def topological_sort(graph):
    visited = set()
    order = []

    def dfs(vertex):
        visited.add(vertex)

        for neighbor in graph[vertex]:
            if neighbor not in visited:
                dfs(neighbor)

        order.append(vertex)

    for vertex in graph:
        if vertex not in visited:
            dfs(vertex)

    order.reverse()

    return order

Example:

graph = {
    "Parse": ["TypeCheck"],
    "TypeCheck": ["Compile"],
    "Compile": ["Test"],
    "Test": [],
}

Output:

[
    "Parse",
    "TypeCheck",
    "Compile",
    "Test",
]

Every dependency constraint is respected.

Why Reverse Finish Order Works

Suppose:

A → B

DFS cannot finish A until B has finished.

Therefore:

finish(B)
before
finish(A)

The finish list contains:

B A

Reversing gives:

A B

which satisfies the dependency.

This argument generalizes to every edge in a DAG.

That is why reverse finish order yields a valid topological sort.

Kahn's Algorithm

Another popular approach uses in-degrees.

The key idea is simple:

A vertex with in-degree zero has no prerequisites.

It can execute immediately.

Consider:

A → C
B → C

Initially:

Vertex In-Degree
A 0
B 0
C 2

Vertices:

A
B

can run immediately.

After removing them:

C

becomes eligible.

This process continues until all vertices are processed.

Recipe: Kahn's Algorithm

from collections import deque

def topological_sort(graph):
    indegree = {
        vertex: 0
        for vertex in graph
    }

    for source in graph:
        for target in graph[source]:
            indegree[target] += 1

    queue = deque()

    for vertex in graph:
        if indegree[vertex] == 0:
            queue.append(vertex)

    order = []

    while queue:
        vertex = queue.popleft()

        order.append(vertex)

        for neighbor in graph[vertex]:
            indegree[neighbor] -= 1

            if indegree[neighbor] == 0:
                queue.append(neighbor)

    return order

Example:

[
    "Parse",
    "TypeCheck",
    "Compile",
    "Test",
]

This algorithm is widely used in schedulers and build systems.

Detecting Cycles

Kahn's algorithm naturally detects cycles.

Suppose:

A → B
B → C
C → A

Every vertex has in-degree:

1

No vertex enters the queue.

The algorithm immediately stalls.

More generally:

if len(order) != len(graph):
    raise ValueError(
        "Cycle detected"
    )

If some vertices remain unprocessed, a cycle must exist.

This makes Kahn's algorithm particularly attractive for dependency validation.

Multiple Valid Orders

A graph may have many valid topological orderings.

Consider:

A → D
B → D
C → D

Valid order:

A B C D

Also valid:

C A B D

Also valid:

B C A D

The only requirement is:

A before D
B before D
C before D

Algorithms may produce different outputs depending on:

  • adjacency-list ordering
  • queue ordering
  • vertex insertion order

This behavior is expected.

Recipe: Dependency Validation

Suppose a package manager stores:

graph = {
    "app": ["web", "db"],
    "web": ["core"],
    "db": ["core"],
    "core": [],
}

A valid installation order:

core
web
db
app

can be obtained through topological sorting.

A simple validation step:

order = topological_sort(graph)

if len(order) != len(graph):
    raise ValueError(
        "Circular dependency"
    )

prevents impossible installations.

Build systems, workflow engines, CI pipelines, and orchestration tools commonly perform exactly this check.

Topological Levels

Sometimes an ordering is insufficient.

You may want execution stages.

Consider:

A → D
B → D
C → E

Possible stages:

Level 0:
A B C

Level 1:
D E

These levels reveal tasks that can execute in parallel.

A modified Kahn algorithm can compute them naturally.

This idea forms the basis of many distributed scheduling systems.

Applications

Topological sorting appears in numerous domains.

Build Systems

Source
↓
Parse
↓
Compile
↓
Link

Package Managers

Library A
↓
Library B
↓
Application

Data Pipelines

Extract
↓
Transform
↓
Load

University Courses

Calculus I
↓
Calculus II
↓
Differential Equations

Workflow Engines

Validate
↓
Process
↓
Publish

The same algorithm solves all of them.

Complexity Analysis

Let:

  • V = number of vertices
  • E = number of edges

DFS-based topological sorting:

Operation Complexity
DFS traversal O(V + E)
Reverse output O(V)
Total O(V + E)

Kahn's algorithm:

Operation Complexity
In-degree computation O(V + E)
Queue processing O(V + E)
Total O(V + E)

Both algorithms are linear in graph size.

Common Pitfalls

Do not attempt topological sorting on undirected graphs.

The concept applies only to directed dependency structures.

Do not ignore cycles.

A cycle means no valid ordering exists.

Do not assume the output is unique.

Multiple valid solutions are common.

Do not forget sink vertices during graph construction.

Do not confuse DFS visitation order with topological order.

Only reverse finish order has the required property.

Do not assume every DAG has a single starting vertex.

Many DAGs contain multiple independent sources.

Recipe: Verify an Ordering

A useful validation technique checks every edge.

def verify_order(graph, order):
    position = {
        vertex: i
        for i, vertex
        in enumerate(order)
    }

    for source in graph:
        for target in graph[source]:
            if position[source] > position[target]:
                return False

    return True

This is often useful in testing and debugging.

Takeaway

Topological sorting transforms a dependency graph into a valid execution order. It applies to Directed Acyclic Graphs and forms the foundation of build systems, package managers, workflow engines, scheduling systems, and many data-processing pipelines. Whether implemented through DFS finish times or Kahn's in-degree algorithm, topological sorting provides a simple and efficient mechanism for respecting precedence constraints while exposing opportunities for parallel execution.