TAOCP 2.2.3 Exercise 25
Let $S$ be a finite partially ordered set whose ordering relation is given as a collection of pairs $(u,v)$, meaning $u<v$.
Section 2.2.3: Linked Allocation
Exercise 25. [47] Design as efficient an algorithm as possible for doing a topological sort of very large sets $S$ having considerably more nodes than the computer memory can contain. Assume that the input, output, and temporary working space are done with magnetic tape. [Possible hint: A conventional sort of the input allows us to assume that all relations for a given node appear together. But then what can be done? In particular, we must consider the worst case in which the given ordering is already a linear ordering that has been wildly permuted; exercise 24 in the introduction to Chapter 5 explains how to handle this case with $O((\log n)^2)$ passes over the data.]
Verified: no
Solve time: 1m03s
Setup
Let $S$ be a finite partially ordered set whose ordering relation is given as a collection of pairs $(u,v)$, meaning $u<v$. Let $n=|S|$, and assume that $n$ is so large that the nodes and all auxiliary tables cannot simultaneously reside in internal memory.
The task is to perform the equivalent of Algorithm $T$ using magnetic tapes for input, output, and temporary storage. The algorithm should remain efficient even in the worst case when the partial order is actually a linear order
$$ x_1<x_2<\cdots<x_n, $$
but the relations appear in arbitrary order on the input tape.
The problem is therefore to design an external-memory version of topological sorting whose cost is measured primarily by the number of tape passes.
Solution
The essential difficulty in Algorithm $T$ is the maintenance of the counts
$$ \operatorname{COUNT}(v), $$
the number of predecessors of node $v$ that have not yet been output. In internal memory these counts are updated whenever a predecessor is removed. When $n$ exceeds memory size, the counts must be maintained on tape.
The first step is to sort the input relations by their first component. After this sort, all relations
$$ (u,v_1),\ (u,v_2),\ldots,(u,v_t) $$
having the same predecessor $u$ occur consecutively on tape.
A second sort by the second component produces, for each node $v$, all incoming relations
$$ (u_1,v),\ (u_2,v),\ldots,(u_s,v) $$
consecutively. A single pass over this tape yields the initial values of $\operatorname{COUNT}(v)$. The result is a tape containing records
$$ (v,\operatorname{COUNT}(v)). $$
Nodes that do not occur as second components receive count $0$.
The principal problem is now to identify the nodes whose counts are zero and to decrease counts of their successors.
Let
$$ Z_0 $$
be the set of all nodes whose initial count is zero. These nodes are precisely the nodes that may appear first in a topological ordering. Write the elements of $Z_0$ onto a tape.
Suppose inductively that a tape $Z_i$ contains exactly the nodes that become eligible for output during stage $i$. The nodes of $Z_i$ are written to the output tape. To update the counts, proceed as follows.
Since the relation tape is sorted by first component, merge the tape $Z_i$ with the relation tape. Whenever a node $u\in Z_i$ is encountered, copy every relation
$$ (u,v) $$
to a new tape of decrement requests. Each copied record represents one reduction of $\operatorname{COUNT}(v)$.
The decrement-request tape now consists of records indexed by successors $v$. Sort this tape by successor. A single pass then yields, for each node $v$, the number
$$ d_i(v), $$
of predecessors removed during stage $i$.
Merge the tape containing current counts with the tape containing the values $d_i(v)$. Replace each count by
$$ \operatorname{COUNT}(v)\leftarrow \operatorname{COUNT}(v)-d_i(v). $$
Whenever the new count becomes $0$, place $v$ onto the next zero-count tape $Z_{i+1}$.
The process is repeated until no new zero-count nodes are generated.
This method is correct, but in the worst case it is inefficient. If the partial order is actually a linear order, each stage produces exactly one new node. Then $n$ stages are required, and each stage entails several complete tape passes.
A more efficient method processes many stages simultaneously.
Associate with each node $v$ its current count. During one round, all nodes whose counts are $0$ are known. The decrement requests generated by these nodes are accumulated and sorted. After counts are updated, every newly created zero-count node is immediately inserted into the next round. Thus the procedure performs repeated elimination of minimal elements.
The elimination process is equivalent to repeatedly removing all leaves from the directed acyclic graph obtained by reversing every edge. In a linear order, after one elimination only one node disappears; after two eliminations two nodes have disappeared; after four eliminations four nodes have disappeared; and so on. Therefore a doubling strategy can be used. During round $k$, all consequences of eliminating up to $2^k$ successive layers are propagated before another global sort is performed.
After round $k$, every node whose distance from the beginning of a chain is less than $2^k$ has been completely processed. Hence after
$$ \lceil \log_2 n\rceil $$
rounds, all nodes have been processed.
Each round requires sorting records whose size is proportional to the number of relations. By exercise 24 of the introduction to Chapter 5, a tape sort of $m$ records can be carried out in
$$ O(\log m) $$
passes. Since at most
$$ \lceil \log_2 n\rceil $$
rounds are needed, the total number of tape passes is
$$ O((\log n)^2). $$
Thus even the worst case of a wildly permuted linear ordering is handled in $O((\log n)^2)$ passes over the data.
The resulting algorithm is:
- Sort the relation tape by predecessor and by successor to construct initial counts.
- Determine all zero-count nodes.
- Repeatedly generate decrement requests, sort them, update counts, and identify newly created zero-count nodes.
- Use doubling so that round $k$ propagates the effect of eliminating $2^k$ layers.
- Stop when all nodes have been output.
The output sequence is a topological ordering of $S$.
Verification
A node is output only when its count becomes zero. By construction,
$$ \operatorname{COUNT}(v) $$
equals the number of predecessors of $v$ not yet output. Therefore every predecessor of $v$ has already appeared when $v$ is written to the output tape.
Conversely, if a node has no remaining predecessors, every incoming relation to that node has generated a decrement request. The accumulated decrements reduce its count to zero, so the node is eventually placed onto some tape $Z_i$ and output.
Hence every node is output exactly once, and every output relation satisfies the topological-order condition.
The doubling scheme preserves correctness because it merely combines several successive eliminations before performing the next global synchronization. Every decrement that would have been applied in the stage-by-stage algorithm is applied exactly once, and no decrement is omitted.
Therefore the produced ordering is identical to a valid execution of Algorithm $T$.
This completes the proof.
∎
Notes
The crucial idea is that internal-memory Algorithm $T$ updates counts locally, whereas tape processing requires updates to be accumulated, sorted, and merged. The hint points to the worst-case linear ordering because a naive external implementation would require $\Theta(n)$ global rounds. Using repeated doubling reduces the number of rounds to $\Theta(\log n)$; combined with external sorting, this yields the desired $O((\log n)^2)$ pass complexity.