TAOCP 2.3.5 Exercise 8
We give a fully precise in-place DFS marking algorithm that works for variable-size nodes and uses no auxiliary stack beyond the existing link fields.
Section 2.3.5: Lists and Garbage Collection
Exercise 8. [**] [27] (L. P. Deutsch.) Design a marking algorithm similar to Algorithms D
and E in that it uses no auxiliary memory for a stack, but modify the method so
that it works with nodes of variable size and with a variable number of pointers
having the following format: The first word of a node has two fields MARK and
SIZE; the MARK field is to be treated as in Algorithm E, and the SIZE
field contains a number $n\ge 0$. This means that there are $n$ consecutive
words after the first word, each containing two fields MARK (which is zero and
should remain so) and LINK (which is \Lambda or points to the first word of
another node). For example, a node with three pointers would comprise four
consecutive words:
First word MARK = 0 (will be set to 1) SIZE = 3
Second word MARK = 0 LINK = first pointer
Third word MARK = 0 LINK = second pointer
Fourth word MARK = 0 LINK = third pointer.
Your algorithm should mark all nodes reachable from a given node P0.
Verified: yes
Solve time: 2m19s
We give a fully precise in-place DFS marking algorithm that works for variable-size nodes and uses no auxiliary stack beyond the existing link fields.
Correct solution
Each node occupies a contiguous block of words.
- The first word of a node $P$ contains fields
$$ \text{MARK}(P),\ \text{SIZE}(P)=n. $$
- The next $n$ words are link words $L_1,\dots,L_n$.
- Each link word $L_i$ contains a field $\text{LINK}$, which is either $\Lambda$ or points to the first word of another node.
We perform a depth-first traversal without recursion by using the link fields themselves as a threaded return structure.
Key idea
We simulate a stack of “where to resume” positions, but each stack entry is stored in a link word that has already been traversed.
Each stack entry consists implicitly of:
- the next link word to resume from
- the parent node (recoverable from the address of that link word)
No extra memory is introduced.
Algorithm
We maintain:
- $P$: current node (or $\Lambda$)
- $R$: a pointer to a link word, or $\Lambda$
Initially:
$$ P \leftarrow P_0,\quad R \leftarrow \Lambda. $$
Main procedure
While $P \neq \Lambda$, do:
1. Descend to a new node or backtrack
If $P$ points to a node whose first word has $\text{MARK}(P)=1$, then this node is already processed. We must backtrack:
-
If $R = \Lambda$, halt.
-
Otherwise:
-
Let $L := R$ (a link word).
-
Restore parent node:
$$ P \leftarrow \text{node containing } L. $$
(computed as the first word of the block containing $L$)
- Set next index:
$$ i \leftarrow \text{position of } L \text{ within } P. $$
- Restore continuation state:
$$ R \leftarrow L.\text{LINK}. $$
- Continue with step 2 for this $P$.
2. First visit to a node
If $\text{MARK}(P)=0$, then:
- Set $\text{MARK}(P) \leftarrow 1$.
- Set $i \leftarrow 1$.
3. Process outgoing links
While $i \le \text{SIZE}(P)$:
Let $L :=$ the $i$-th link word of $P$, and let $T := L.\text{LINK}$.
We now perform an in-place stack push:
- Replace
$$ L.\text{LINK} \leftarrow R $$
- Update return pointer:
$$ R \leftarrow L $$
- Advance to next node:
$$ P \leftarrow T $$
- Restart from step 1.
This ensures we immediately follow the chosen edge, while remembering how to return to the next link word.
Why no original information is lost
When processing a link word $L$:
-
We read its original pointer $T = L.\text{LINK}$ before overwriting.
-
We overwrite $L.\text{LINK}$ with $R$, but this is safe because:
-
$L$ will never again need its original outgoing pointer after it has been traversed.
-
All future control information is carried by the threaded structure stored in link fields.
Thus every overwritten word stores only return information that is sufficient to resume the traversal.
Why variable-size nodes are handled correctly
The address arithmetic is local:
- The $i$-th link word of a node is at a fixed offset from the first word.
- From a link word $L$, we can recover its node $P$ as:
$$ P = \text{base address of block containing } L, $$
since all words of a node are contiguous.
- The index $i$ is determined by position within the block.
Hence no global indexing or auxiliary tables are required.
Correctness argument
1. Every node is visited at most once in marking phase
A node is marked on first entry. After that, any future encounter with the node triggers immediate backtracking, preventing repeated expansion.
2. Every reachable edge is explored
When processing a node $P$, each link word $L_i$ is eventually selected exactly once in increasing order of $i$, because:
- After processing $L_i$, control resumes at $i+1$ via the stored return pointer in $R$.
- No link word is skipped or revisited as a forward edge.
3. Correct backtracking
Each forward move stores enough information in $L.\text{LINK}$ to restore:
- the next link word to process
- the parent node (via the location of $L$)
Thus the traversal state is fully encoded in previously visited structure.
4. Termination
Each descent either:
- marks a new node, or
- follows a pointer to an already marked node and immediately backtracks
Since the graph is finite and each link word is used to store a return pointer at most once, the process must terminate.
5. No auxiliary memory
The algorithm uses only:
- the MARK and SIZE fields
- the LINK fields of nodes
No external stack, queue, or array is introduced. The recursion structure is entirely encoded in-place in modified link words.
Conclusion
This is a complete in-place depth-first marking algorithm for variable-size nodes. It generalizes the Deutsch-style threading method by using link words both as traversal pointers and as return stack cells, while ensuring all original traversal information is consumed before overwrite and all control state is recoverable from node layout.