TAOCP 2.5 Exercise 33

We are asked to design a **garbage collection and compaction algorithm** for a memory pool of nodes with the following properties: - Memory locations $1,2,\dots,AVAIL-1$ contain nodes `NODE(P)` of var...

Section 2.5: Dynamic Storage Allocation

Exercise 33. [**] [28] (Garbage collection and compacting.) Assume that memory locations $1,2,\ldots,AVAIL-1$ are being used as a storage pool for nodes of varying sizes, having the following form: The first word of NODE(P) contains the fields

SIZE(P) = number of words in NODE(P);
   T(P) = number of link fields in NODE(P); T(P) < SIZE(P);
LINK(P) = special link field for use only during garbage collection.

The node immediately following NODE(P) in memory is NODE(P + SIZE(P)). Assume that the only fields in NODE(P) that are used as links to other nodes are LINK(P + 1), LINK(P + 2), ..., LINK(P + T(P)), and each of these link fields is either $\Lambda$ or the address of the first word of another node. Finally, assume that there is one further link variable in the program, called USE, and it points to one of the nodes.

Design an algorithm that (i) determines all nodes accessible directly or indirectly from the variable USE, (ii) moves these nodes into memory locations 1 through K - 1, for some K, changing all links so that structural relationships are preserved, and (iii) sets AVAIL <- K.

For example, consider the following contents of memory, where INFO(L) denotes the contents of location L, excluding LINK(L):

1:  SIZE = 2, T = 1       6:  SIZE = 2, T = 0       AVAIL = 11,
2:  LINK = 6, INFO = A    7:  CONTENTS = D          USE = 3.
3:  SIZE = 3, T = 1       8:  SIZE = 3, T = 2
4:  LINK = 8, INFO = B    9:  LINK = 8, INFO = E
5:  CONTENTS = C         10:  LINK = 3, INFO = F

Your algorithm should transform this into

1:  SIZE = 3, T = 1       4:  SIZE = 3, T = 2       AVAIL = 7,
2:  LINK = 4, INFO = B    5:  LINK = 4, INFO = E    USE = 1.
3:  CONTENTS = C          6:  LINK = 1, INFO = F

Exercises

Verified: no
Solve time: 5m10s


Solution

We are asked to design a garbage collection and compaction algorithm for a memory pool of nodes with the following properties:

  • Memory locations $1,2,\dots,AVAIL-1$ contain nodes NODE(P) of varying sizes.
  • Each node NODE(P) has a first word containing:

$$ \text{SIZE}(P) = \text{number of words in NODE(P)}, \quad T(P) = \text{number of link fields}, \quad LINK(P) = \text{special field for garbage collection}. $$

  • Link fields used for pointers are exactly $LINK(P+1),\dots,LINK(P+T(P))$.
  • Each link field is either $\Lambda$ or the address of another node's first word.
  • There is one root variable USE pointing to a node.

We must:

  1. Identify all nodes reachable from USE.
  2. Move reachable nodes into memory locations $1,\dots,K-1$, preserving structure.
  3. Update links so that structural relationships are preserved.
  4. Set $AVAIL \gets K$.

The only auxiliary storage allowed is the field LINK(P) in each node; we aim for an in-place mark-compact algorithm.

Phase 1: Mark reachable nodes

We traverse all nodes reachable from USE, marking them without destroying link information. We use the special field LINK(P) as a temporary marker:

  • Initially, all LINK(P) are unused by the collector.
  • Define a marked node as one with LINK(P) = MARKED (or any special distinguished value; we will use a flag distinct from valid addresses).

Algorithm (Mark phase)

  1. If USE = \Lambda, no node is reachable; skip to Phase 2.

  2. Otherwise, set STACK to empty (implemented via LINK(P) fields to avoid external storage if desired).

  3. Push USE onto the stack.

  4. While the stack is not empty:

  5. Pop a node P.

  6. If LINK(P) is not marked:

  • Set LINK(P) \gets MARKED.

  • For each $i = 1,\dots,T(P)$:

  • Let $Q = LINK(P+i)$.

  • If $Q \neq \Lambda$, push $Q$ onto the stack.

Correctness

  • Each reachable node is eventually marked.
  • Unreachable nodes retain their LINK(P) unchanged.
  • No link information in $LINK(P+1),\dots,LINK(P+T(P))$ is destroyed.

Phase 2: Assign forwarding addresses

We now determine new compacted locations for marked nodes. We store the forwarding address in the special field LINK(P) of each marked node.

Algorithm (Address assignment)

  1. Set K \gets 1.
  2. Scan memory from location 1 to AVAIL-1.
  3. For each node P:
  • If LINK(P) = MARKED:
  1. Let s = SIZE(P).
  2. Set LINK(P) \gets K (forwarding address).
  3. Increment K \gets K + s.

Justification

  • Only reachable nodes are assigned a new address.
  • The old link information in link fields LINK(P+1),...,LINK(P+T(P)) is intact.
  • Forwarding addresses are stored in a dedicated field and used for pointer redirection in the next phase.

Redirect all internal links to new addresses

  • For each node P with LINK(P) a forwarding address:

  • For $i = 1,\dots,T(P)$:

  • Let Q = LINK(P+i).

  • If Q \neq \Lambda, replace LINK(P+i) \gets LINK(Q).

Redirect root variable

$$ USE \gets LINK(USE) $$

Correctness

  • All reachable nodes now have LINK(P) containing their new location.
  • Internal links point to forwarding addresses; USE is updated accordingly.
  • Unreachable nodes are ignored.

Phase 4: Copy nodes to new locations

We now copy marked nodes to their compacted locations. To avoid overwriting unprocessed nodes, copy in increasing order of original memory addresses, which is safe because new locations are always higher than or equal to old ones for nodes scanned in memory order.

Algorithm (Copy phase)

  1. For P from 1 to AVAIL-1:
  • If LINK(P) contains a forwarding address new:

  • Let s = SIZE(P).

  • For j = 0,\dots,s-1:

$$ \text{Memory}[new+j] \gets \text{Memory}[P+j] $$

  • Restore the first word:

$$ LINK(\text{Memory}[new]) \gets \text{Memory}[new].LINK \text{ (already used as forwarding address)} $$

Justification

  • Copying in increasing order ensures that source words are never overwritten before they are used.
  • Forwarding addresses stored in LINK(P) survive until all links are updated.
  • After copying, all reachable nodes occupy consecutive memory locations starting at 1.

Phase 5: Update AVAIL

Set

$$ AVAIL \gets K $$

This is the first unused memory location after the compacted nodes.

Example Execution

Initial memory:

Loc SIZE T LINK INFO
1 2 1 6 A
2
3 3 1 8 B
4
5 C
6 2 0 D
7
8 3 2 8 E
9
10 3 F
AVAIL 11
USE 3

Step 1: Mark reachable nodes

  • Start at USE=3.
  • Node 3: mark
  • Node 3 links to 8 → mark node 8
  • Node 8 links to 8 (self-loop) and 3 (already marked)
  • Nodes 1, 6 are not reachable.
  • Reachable set: {3,8}

Step 2: Assign forwarding addresses

  • K=1
  • Node 3: size 3, LINK(3)=1, K=4
  • Node 8: size 3, LINK(8)=4, K=7

Step 3: Redirect links

  • Node 3: LINK(4) = LINK(8) = 4
  • Node 8: LINK(9) = LINK(8) = 4, LINK(10) = LINK(3) = 1
  • USE = LINK(3) = 1

Step 4: Copy nodes

  • Node 3 → locations 1-3

  • Node 8 → locations 4-6

  • Restore sizes in first words:

  • SIZE(1) = 3, T(1) = 1

  • SIZE(4) = 3, T(4) = 2

Step 5: Update AVAIL

$$ AVAIL = 7 $$

Final memory layout

Loc SIZE T LINK INFO
1 3 1 4 B
2 C
3
4 3 2 4 E
5
6 1 F
AVAIL 7
USE 1

This exactly matches the exercise's example.

Justification and Correctness

  1. Reachable nodes are exactly those marked. Mark phase visits every node reachable from USE through link fields. Unreachable nodes remain unmarked and are discarded.
  2. Forwarding addresses safely preserve new locations. The special field LINK(P) is used temporarily; old link fields remain untouched until link redirection.
  3. Links are updated correctly. Each internal link LINK(P+i) is replaced by the forwarding address of its target node. The root pointer USE is updated similarly.
  4. Compaction is safe in-place. Scanning in increasing memory order guarantees that nodes are copied to new locations without overwriting unprocessed source nodes.
  5. Memory is contiguous. Reachable nodes occupy memory 1..AVAIL-1 after compaction.
  6. AVAIL is updated. The first free location after the compacted nodes is assigned to AVAIL.
  7. Example execution matches the exercise. Nodes 3 and 8 are compacted,