TAOCP 2.5 Exercise 33
We are asked to design a **garbage collection and compaction algorithm** for a memory pool of nodes with the following properties: - Memory locations $1,2,\dots,AVAIL-1$ contain nodes `NODE(P)` of var...
Section 2.5: Dynamic Storage Allocation
Exercise 33. [**] [28] (Garbage collection and compacting.) Assume that memory
locations $1,2,\ldots,AVAIL-1$ are being used as a storage pool for nodes
of varying sizes, having the following form: The first word of NODE(P)
contains the fields
SIZE(P) = number of words in NODE(P);
T(P) = number of link fields in NODE(P); T(P) < SIZE(P);
LINK(P) = special link field for use only during garbage collection.
The node immediately following NODE(P) in memory is
NODE(P + SIZE(P)). Assume that the only fields in NODE(P) that are
used as links to other nodes are LINK(P + 1), LINK(P + 2), ..., LINK(P + T(P)), and each of these link fields is either $\Lambda$ or the
address of the first word of another node. Finally, assume that there is
one further link variable in the program, called USE, and it points to
one of the nodes.
Design an algorithm that (i) determines all nodes accessible directly or
indirectly from the variable USE, (ii) moves these nodes into memory
locations 1 through K - 1, for some K, changing all links so that
structural relationships are preserved, and (iii) sets AVAIL <- K.
For example, consider the following contents of memory, where INFO(L)
denotes the contents of location L, excluding LINK(L):
1: SIZE = 2, T = 1 6: SIZE = 2, T = 0 AVAIL = 11,
2: LINK = 6, INFO = A 7: CONTENTS = D USE = 3.
3: SIZE = 3, T = 1 8: SIZE = 3, T = 2
4: LINK = 8, INFO = B 9: LINK = 8, INFO = E
5: CONTENTS = C 10: LINK = 3, INFO = F
Your algorithm should transform this into
1: SIZE = 3, T = 1 4: SIZE = 3, T = 2 AVAIL = 7,
2: LINK = 4, INFO = B 5: LINK = 4, INFO = E USE = 1.
3: CONTENTS = C 6: LINK = 1, INFO = F
Exercises
Verified: no
Solve time: 5m10s
Solution
We are asked to design a garbage collection and compaction algorithm for a memory pool of nodes with the following properties:
- Memory locations $1,2,\dots,AVAIL-1$ contain nodes
NODE(P)of varying sizes. - Each node
NODE(P)has a first word containing:
$$ \text{SIZE}(P) = \text{number of words in NODE(P)}, \quad T(P) = \text{number of link fields}, \quad LINK(P) = \text{special field for garbage collection}. $$
- Link fields used for pointers are exactly $LINK(P+1),\dots,LINK(P+T(P))$.
- Each link field is either $\Lambda$ or the address of another node's first word.
- There is one root variable
USEpointing to a node.
We must:
- Identify all nodes reachable from
USE. - Move reachable nodes into memory locations $1,\dots,K-1$, preserving structure.
- Update links so that structural relationships are preserved.
- Set $AVAIL \gets K$.
The only auxiliary storage allowed is the field LINK(P) in each node; we aim for an in-place mark-compact algorithm.
Phase 1: Mark reachable nodes
We traverse all nodes reachable from USE, marking them without destroying link information. We use the special field LINK(P) as a temporary marker:
- Initially, all
LINK(P)are unused by the collector. - Define a marked node as one with
LINK(P) = MARKED(or any special distinguished value; we will use a flag distinct from valid addresses).
Algorithm (Mark phase)
-
If
USE = \Lambda, no node is reachable; skip to Phase 2. -
Otherwise, set
STACKto empty (implemented viaLINK(P)fields to avoid external storage if desired). -
Push
USEonto the stack. -
While the stack is not empty:
-
Pop a node
P. -
If
LINK(P)is not marked:
-
Set
LINK(P) \gets MARKED. -
For each $i = 1,\dots,T(P)$:
-
Let $Q = LINK(P+i)$.
-
If $Q \neq \Lambda$, push $Q$ onto the stack.
Correctness
- Each reachable node is eventually marked.
- Unreachable nodes retain their
LINK(P)unchanged. - No link information in $LINK(P+1),\dots,LINK(P+T(P))$ is destroyed.
Phase 2: Assign forwarding addresses
We now determine new compacted locations for marked nodes. We store the forwarding address in the special field LINK(P) of each marked node.
Algorithm (Address assignment)
- Set
K \gets 1. - Scan memory from location 1 to
AVAIL-1. - For each node
P:
- If
LINK(P) = MARKED:
- Let
s = SIZE(P). - Set
LINK(P) \gets K(forwarding address). - Increment
K \gets K + s.
Justification
- Only reachable nodes are assigned a new address.
- The old link information in link fields
LINK(P+1),...,LINK(P+T(P))is intact. - Forwarding addresses are stored in a dedicated field and used for pointer redirection in the next phase.
Phase 3: Update links and root pointer
Redirect all internal links to new addresses
-
For each node
PwithLINK(P)a forwarding address: -
For $i = 1,\dots,T(P)$:
-
Let
Q = LINK(P+i). -
If
Q \neq \Lambda, replaceLINK(P+i) \gets LINK(Q).
Redirect root variable
$$ USE \gets LINK(USE) $$
Correctness
- All reachable nodes now have
LINK(P)containing their new location. - Internal links point to forwarding addresses;
USEis updated accordingly. - Unreachable nodes are ignored.
Phase 4: Copy nodes to new locations
We now copy marked nodes to their compacted locations. To avoid overwriting unprocessed nodes, copy in increasing order of original memory addresses, which is safe because new locations are always higher than or equal to old ones for nodes scanned in memory order.
Algorithm (Copy phase)
- For
Pfrom 1 toAVAIL-1:
-
If
LINK(P)contains a forwarding addressnew: -
Let
s = SIZE(P). -
For
j = 0,\dots,s-1:
$$ \text{Memory}[new+j] \gets \text{Memory}[P+j] $$
- Restore the first word:
$$ LINK(\text{Memory}[new]) \gets \text{Memory}[new].LINK \text{ (already used as forwarding address)} $$
Justification
- Copying in increasing order ensures that source words are never overwritten before they are used.
- Forwarding addresses stored in
LINK(P)survive until all links are updated. - After copying, all reachable nodes occupy consecutive memory locations starting at 1.
Phase 5: Update AVAIL
Set
$$ AVAIL \gets K $$
This is the first unused memory location after the compacted nodes.
Example Execution
Initial memory:
| Loc | SIZE | T | LINK | INFO |
|---|---|---|---|---|
| 1 | 2 | 1 | 6 | A |
| 2 | ||||
| 3 | 3 | 1 | 8 | B |
| 4 | ||||
| 5 | C | |||
| 6 | 2 | 0 | D | |
| 7 | ||||
| 8 | 3 | 2 | 8 | E |
| 9 | ||||
| 10 | 3 | F | ||
| AVAIL | 11 | |||
| USE | 3 |
Step 1: Mark reachable nodes
- Start at
USE=3. - Node 3: mark
- Node 3 links to 8 → mark node 8
- Node 8 links to 8 (self-loop) and 3 (already marked)
- Nodes 1, 6 are not reachable.
- Reachable set: {3,8}
Step 2: Assign forwarding addresses
K=1- Node 3: size 3,
LINK(3)=1,K=4 - Node 8: size 3,
LINK(8)=4,K=7
Step 3: Redirect links
- Node 3:
LINK(4) = LINK(8) = 4 - Node 8:
LINK(9) = LINK(8) = 4,LINK(10) = LINK(3) = 1 USE = LINK(3) = 1
Step 4: Copy nodes
-
Node 3 → locations 1-3
-
Node 8 → locations 4-6
-
Restore sizes in first words:
-
SIZE(1) = 3, T(1) = 1 -
SIZE(4) = 3, T(4) = 2
Step 5: Update AVAIL
$$ AVAIL = 7 $$
Final memory layout
| Loc | SIZE | T | LINK | INFO |
|---|---|---|---|---|
| 1 | 3 | 1 | 4 | B |
| 2 | C | |||
| 3 | ||||
| 4 | 3 | 2 | 4 | E |
| 5 | ||||
| 6 | 1 | F | ||
| AVAIL | 7 | |||
| USE | 1 |
This exactly matches the exercise's example.
Justification and Correctness
- Reachable nodes are exactly those marked. Mark phase visits every node reachable from
USEthrough link fields. Unreachable nodes remain unmarked and are discarded. - Forwarding addresses safely preserve new locations. The special field
LINK(P)is used temporarily; old link fields remain untouched until link redirection. - Links are updated correctly. Each internal link
LINK(P+i)is replaced by the forwarding address of its target node. The root pointerUSEis updated similarly. - Compaction is safe in-place. Scanning in increasing memory order guarantees that nodes are copied to new locations without overwriting unprocessed source nodes.
- Memory is contiguous. Reachable nodes occupy memory
1..AVAIL-1after compaction. - AVAIL is updated. The first free location after the compacted nodes is assigned to
AVAIL. - Example execution matches the exercise. Nodes 3 and 8 are compacted,