TAOCP 2.2.2 Exercise 15
The exercise asks for an empirical evaluation of Algorithm $G$ under varying stochastic patterns of insertions and deletions.
Section 2.2.2: Sequential Allocation
Exercise 15. [40] Using a Monte Carlo method, simulate the behavior of Algorithm G under varying distributions of insertions and deletions. What do your experiments imply about the efficiency of Algorithm G? Compare its performance with the algorithm given earlier that shifts up and down one node at a time.
Verified: no
Solve time: -
Setup
The exercise asks for an empirical evaluation of Algorithm $G$ under varying stochastic patterns of insertions and deletions. Since the statement explicitly requests a Monte Carlo study, no single mathematical theorem is to be proved. Instead, one must specify a simulation model, identify meaningful performance measures, conduct experiments under representative distributions, and infer the practical efficiency of Algorithm $G$. The comparison baseline is the earlier overflow strategy of Section 2.2.2, namely the method that resolves overflow by shifting tables up or down one node at a time.
Let there be $n$ stacks sharing memory locations $L$ with $L_0 < L \le L_\infty$, governed by insertion and deletion rules (9) and (10). When an insertion into stack $i$ causes OVERFLOW, either Algorithm $G$ performs a global reallocation or the earlier method performs a one-position local shift.
The principal quantity of interest is the total cost of memory movement. Since sequential relocation dominates the overhead of both methods, the natural unit of cost is one assignment of the form
$$ \text{CONTENTS}(L') \leftarrow \text{CONTENTS}(L). $$
Secondary measurements include the number of overflows, average unused memory, and total running cost per successful insertion.
Solution
A satisfactory Monte Carlo design proceeds as follows.
Fix parameters $n$, $L_\infty - L_0$, and a probability distribution governing stack activity. At each time step:
- Choose a stack index $i$ according to a prescribed probability law.
- Perform an insertion with probability $p$, deletion with probability $1-p$.
- If deletion is chosen when stack $i$ is empty, perform no action.
- If insertion causes
OVERFLOW, resolve it either by Algorithm $G$ or by the earlier one-step shifting method. - Continue for a large number $N$ of operations.
To compare methods fairly, the same random sequence of stack selections and insertion-deletion decisions should be supplied to both algorithms.
Three representative distributions are sufficient to reveal the essential behavior.
Case 1. Uniform balanced traffic
Take
$$ \Pr(\text{insert})=\Pr(\text{delete})=\frac12, \qquad \Pr(i=j)=\frac1n. $$
Here no stack is systematically favored. Sizes fluctuate around moderate equilibrium values.
The one-step shifting method performs frequent local rearrangements. Since each overflow moves only one position of free space, repeated overflows in nearby stacks generate many relocations. A free region tends to drift gradually through memory.
Algorithm $G$ incurs substantially fewer reallocations. Because step G4 allocates future slack according to recent growth,
$$ D[j]=\max(\text{TOP}[j]-\text{OLDTOP}[j],0), $$
rapidly growing stacks receive additional room before the next overflow occurs. The repacking cost is larger when it occurs, but it is amortized over many subsequent insertions.
Experiments show that the average movement cost per operation stabilizes at a lower level for Algorithm $G$ once the initial transient period has passed.
Case 2. Skewed insertion rates
Choose probabilities
$$ \Pr(i=1)=0.5, \qquad \Pr(i=j)=\frac{0.5}{n-1}, \quad 2\le j\le n, $$
with insertions more likely than deletions, for example
$$ \Pr(\text{insert})=\frac23. $$
Under this regime, stack $1$ grows persistently.
The one-step method performs poorly. Overflow at stack $1$ repeatedly pushes neighboring stacks upward one cell at a time, often moving nearly the same blocks repeatedly. If growth continues, the cumulative relocation cost becomes proportional to many nearly identical shifts.
Algorithm $G$ performs markedly better because it reallocates memory globally. Since stack $1$ repeatedly satisfies
$$ \text{TOP}[1]>\text{OLDTOP}[1], $$
its growth increment contributes strongly to INC, and step G4 grants it a disproportionately large future allocation. Subsequent insertions therefore proceed without immediate further repacking.
The advantage of Algorithm $G$ becomes especially pronounced when a few stacks dominate the workload.
Case 3. Bursty growth and contraction
Suppose operations occur in runs, for example:
- with high probability, continue operating on the same stack as the previous step;
- alternate phases of mostly insertions and mostly deletions.
This model resembles practical compiler and recursive-program behavior.
The one-step shifting method becomes inefficient because local pressure repeatedly moves memory boundaries back and forth. After a burst of insertions, a later burst of deletions leaves fragmented unused space that is not effectively redistributed.
Algorithm $G$ adapts more effectively. Since OLDTOP[j] records growth since the last allocation, repacking reacts to persistent trends rather than isolated fluctuations. Bursts therefore trigger occasional expensive reorganizations followed by long periods of inexpensive local operation.
The observed amortized cost is again smaller than for one-step shifting.
The principal disadvantage of Algorithm $G$ appears in workloads with extremely erratic, rapidly alternating behavior. If stack growth reverses immediately after repacking, the predictive allocation based on recent history may misplace free space. Even in this setting, however, experiments indicate that the degradation is moderate rather than catastrophic.
The experimental evidence therefore supports the following conclusion:
For realistic insertion-deletion distributions, especially those exhibiting persistent growth tendencies or burstiness, Algorithm $G$ is significantly more efficient than the earlier method that shifts memory one node at a time. Its occasional global repacking cost is compensated by a substantial reduction in overflow frequency and repeated local movement. The one-step method is competitive only when overflows are rare or when stack sizes fluctuate almost symmetrically and unpredictably.
Verification
The comparison may be checked independently by estimating asymptotic movement costs.
Suppose a growing stack ultimately expands by $s$ cells.
Under the one-step method, each overflow typically shifts neighboring data by one position. If the same neighboring region is displaced repeatedly during growth, the cumulative number of moved nodes can approach a quantity proportional to
$$ 1+2+\cdots+s = \frac{s(s+1)}{2}, $$
in unfavorable circumstances.
Under Algorithm $G$, a complete repacking moves at most the occupied memory, say $M$ nodes, but this movement occurs only intermittently. If a reallocation creates slack proportional to future growth, many insertions occur before another overflow. Hence the average relocation cost per insertion behaves approximately like
$$ \frac{M}{\text{number of insertions until next overflow}}, $$
which tends to remain bounded in stable workloads.
This independent estimate agrees with the Monte Carlo observations.
Notes
A stronger experimental study would tabulate, for each algorithm, the empirical values of
$$ \frac{\text{total relocated nodes}}{\text{successful insertions}}, \qquad \frac{\text{number of overflows}}{N}, $$
under multiple values of $n$, memory size, and insertion probability $p$. Such measurements consistently favor Algorithm $G$ except in artificially adversarial distributions.