TAOCP 2.3.5 Exercise 3
Let the root node be $P$.
Section 2.3.5: Lists and Garbage Collection
Exercise 3. [**] [M26] Prove the validity of Algorithm E. [Hint: See the proof of Algorithm 2.3.1T.]
Verified: no
Solve time: 5m42s
Let the root node be $P$. The proof follows the pattern of the proof of Algorithm 2.3.1T, but it must be tied to the actual bookkeeping used in Algorithm E.
For every node $X$, let $\mathrm{TYPE}(X)$ be the value of its type field. In Algorithm E, the type field serves two purposes:
- It determines which outgoing links of $X$ have already been examined.
- When $X$ is on the current search path, it records which link of $X$ has been replaced by a backlink to the predecessor of $X$.
The finite set of possible type values therefore forms a finite state machine. Each state specifies both the next link to be examined and, when $X$ is active on the search path, the location of the reversed link.
Let
$$ P=P_0,P_1,\ldots,P_k=Q $$
be the chain obtained by following the reversed links from the current node $Q$ back to the root.
We prove that the following invariant holds whenever control returns to the beginning of Step E2.
Invariant
- The nodes $P_0,\ldots,P_k$ are exactly the active nodes of the depth-first search.
- For each $i$ with $1\le i\le k$, exactly one link of $P_i$ has been replaced by a backlink to $P_{i-1}$.
- The current value of $\mathrm{TYPE}(P_i)$ identifies unambiguously which link of $P_i$ is the reversed link.
- Every node not on the chain $P_0,\ldots,P_k$ has all of its links exactly as in the original structure.
- For every active node $P_i$, the state $\mathrm{TYPE}(P_i)$ specifies precisely which of its admissible links have already been examined and which remain to be examined.
The invariant is true initially. At the start, $Q=P$; the search path consists only of the root, no links have been reversed, and the initial type value of $P$ indicates that no admissible links have yet been examined.
We now verify that every transition of Algorithm E preserves the invariant.
Descent
Suppose Step E2 selects an admissible link of $Q=P_k$ leading to an unmarked node $R$.
Algorithm E changes the type of $R$ to the state corresponding to entry into $R$. It then replaces in $R$ the appropriate link by a backlink to $Q$. The type value assigned to $R$ records exactly which link has been replaced.
Consequently,
$$ P_0,P_1,\ldots,P_k,R $$
becomes the new search path.
Exactly one new reversed link is created, namely the link in $R$ that now points to $Q$. The predecessor chain can therefore be recovered uniquely from the type information. No links outside the enlarged search path are changed.
Hence properties $1$ through $5$ remain true.
Examination of another outgoing link
Suppose the currently designated admissible link of $Q$ is null or points to a node that is already marked.
In this case Algorithm E does not descend. Instead it changes $\mathrm{TYPE}(Q)$ to the next state of the finite state machine associated with $Q$.
No link fields are modified. The search path remains unchanged. The new type value merely records that one more admissible link has been examined.
Therefore properties $1$ through $5$ continue to hold.
Retreat
Suppose all admissible links of $Q=P_k$ have been examined.
By property $3$, the current type value of $Q$ identifies the unique reversed link of $Q$. Algorithm E uses this information to determine the predecessor $P_{k-1}$ and to reconstruct the original value of the reversed link. The link is then restored exactly as it was before $Q$ was entered.
After restoration, $Q$ no longer belongs to the active path, and control returns to $P_{k-1}$.
Thus the new active path is
$$ P_0,P_1,\ldots,P_{k-1}. $$
The node $Q$ now has exactly its original links. No other node off the active path has been altered. Hence properties $1$ through $5$ are preserved.
This verifies the invariant in every possible branch of the algorithm.
Termination
Each node is marked only when first entered. Since Algorithm E never descends to a marked node, every node can be entered at most once.
For a fixed node $X$, the type field passes through a finite sequence of states. Each transition either
- advances $\mathrm{TYPE}(X)$ to the next state corresponding to examination of another admissible link, or
- completes the processing of $X$ and causes a retreat.
The type state never moves backward. Therefore the local state of $X$ cannot cycle.
Since the accessible structure is finite, only finitely many nodes can be entered, and each entered node undergoes only finitely many type transitions. Hence only finitely many steps are possible. Eventually the root reaches its terminal type state, at which point the algorithm stops.
Reachability
Every marked node is reachable from $P$.
This is proved by induction on the order in which nodes are marked. The root is reachable from itself. If a node $R$ is marked when Algorithm E follows an admissible link from a previously marked node $S$, then $R$ is reachable from $P$ because $S$ is.
Conversely, let $R$ be any node reachable from $P$. Consider a path
$$ P=X_0,X_1,\ldots,X_m=R $$
of admissible links.
Assume that $R$ is never marked, and choose such an $R$ with $m$ minimal. Then $X_{m-1}$ must eventually be marked. By property $5$, Algorithm E examines every admissible link of $X_{m-1}$ before declaring that node finished. In particular it examines the link leading to $R$.
When that examination occurs, $R$ cannot already be marked, by assumption. Therefore Algorithm E must descend to $R$ and mark it, a contradiction.
Hence every reachable node is eventually marked.
Therefore the marked nodes are exactly the nodes reachable from $P$.
Restoration of the structure
A link is modified only when a node is entered and placed on the active search path. By properties $2$ and $3$, the type information always identifies the unique modified link. When the node is finished, the retreat step restores that link before the node leaves the search path.
Thus every reversed link is restored exactly once. When the algorithm terminates, no node remains on the active path, so no modified links remain.
Consequently every link field has its original value. The structure at termination is identical to the structure at the start.
Therefore Algorithm E terminates, marks exactly the nodes reachable from the root, and restores the original structure completely. Hence Algorithm E is valid. ∎