TAOCP 2.2.2 Exercise 9
Let the successive operations be represented by a sequence a_1,a_2,\ldots,a_m, where each $a_t$ denotes the stack on which the $t$th insertion is performed.
Section 2.2.2: Sequential Allocation
Exercise 9. [M27] Using the mathematical model described near the end of the text, prove that Eq. (14) is the expected number of moves. (Note that the sequence 1, 1, 4, 2, 3, 1, 2, 4, 2, 1 specifies $0 + 0 + 0 + 1 + 1 + 3 + 2 + 0 + 3 + 6 = 16$ moves.)
Verified: no
Solve time: -
Setup
Let the successive operations be represented by a sequence
$$ a_1,a_2,\ldots,a_m, $$
where each $a_t$ denotes the stack on which the $t$th insertion is performed. The mathematical model near the end of the section assumes that each insertion is equally likely to occur in any of the $n$ stacks, independently of all previous insertions.
Let $M_t$ denote the number of moves required when the $t$th insertion occurs. We must prove that Eq. (14) gives the expected total number of moves,
$$ E(M_1+\cdots+M_m). $$
The example in the statement,
$$ 1,1,4,2,3,1,2,4,2,1, $$
yields
$$ 0+0+0+1+1+3+2+0+3+6=16 $$
moves. The quantity predicted by Eq. (14) must therefore coincide with the expectation of this random process.
Solution
The model described in the text treats the stacks symmetrically. At any stage of the process, the amount of work required by an insertion depends only on how many insertions have previously been made into the same stack.
Let
$$ N_i(t) $$
be the number of insertions already made into stack $i$ immediately before time $t$. Then
$$ N_1(t)+\cdots+N_n(t)=t-1. $$
When insertion $a_t=i$ occurs, the repacking method considered in the mathematical model causes exactly
$$ N_i(t) $$
moves. The reason is that the $N_i(t)$ entries already belonging to stack $i$ must be shifted to accommodate the new insertion. Therefore
$$ M_t=N_{a_t}(t). $$
To compute the expectation, introduce indicator variables
$$ I_{rs}= \begin{cases} 1, & \text{if } a_r=a_s,\ 0, & \text{otherwise}, \end{cases} \qquad 1\le r<s\le m. $$
Since each insertion chooses a stack independently and uniformly,
$$ P(a_r=a_s)=\frac{1}{n}, $$
hence
$$ E(I_{rs})=\frac{1}{n}. $$
For fixed $s$, the number of moves at step $s$ equals the number of earlier insertions into the same stack:
$$ M_s=\sum_{r=1}^{s-1} I_{rs}. $$
Taking expectations and using linearity,
$$ E(M_s) = \sum_{r=1}^{s-1} E(I_{rs}) = \sum_{r=1}^{s-1}\frac{1}{n} = \frac{s-1}{n}. $$
The expected total number of moves is therefore
$$ E!\left(\sum_{s=1}^{m} M_s\right) = \sum_{s=1}^{m} E(M_s) = \sum_{s=1}^{m}\frac{s-1}{n}. $$
Since
$$ \sum_{s=1}^{m}(s-1) = 0+1+\cdots+(m-1) = \frac{m(m-1)}{2}, $$
we obtain
$$ E(M_1+\cdots+M_m) = \frac{1}{n}\cdot\frac{m(m-1)}{2}. $$
This is precisely Eq. (14). Hence Eq. (14) is the expected number of moves predicted by the mathematical model.
This completes the proof.
∎
Verification
For the sample sequence
$$ 1,1,4,2,3,1,2,4,2,1, $$
the move counts are
$$ 0,\ 0,\ 0,\ 1,\ 1,\ 3,\ 2,\ 0,\ 3,\ 6, $$
whose sum is
$$ 16. $$
This agrees with the interpretation that the move count at each insertion equals the number of previous insertions into the same stack.
Since there are $m=10$ insertions and $n=4$ stacks, Eq. (14) predicts
$$ \frac{10\cdot9}{2\cdot4} = \frac{45}{4} = 11.25 $$
as the expected number of moves. The observed value $16$ for this particular sequence is therefore a possible realization above the mean.
Notes
The argument depends only on linearity of expectation. Independence between different pairs of insertions is unnecessary; only the equality
$$ P(a_r=a_s)=\frac{1}{n} $$
for each pair $r<s$ is required.