TAOCP 3.4.1 Exercise 7
**Exercise 3.
Section 3.4.1: Numerical Distributions
Exercise 7. ▶ [**] [26] (A. J. Walker.) Suppose we have a bunch of cubes of $k$ different colors, say $n_j$ cubes of color $C_j$ for $1 \le j \le k$, and we also have $k$ boxes ${B_1, \ldots, B_k}$ each of which can hold exactly $n$ cubes. Furthermore $n_1 + \cdots + n_k = kn$, so the cubes will just fit in the boxes. Prove (constructively) that there is always a way to put the cubes into the boxes so that each box contains at most two different colors of cubes; in fact, there is a way to do it so that, whenever box $B_j$ contains two colors, one of those colors is $C_j$. Show how to use this principle to compute the $P$ and $Y$ tables required in (3), given a probability distribution $(p_1, \ldots, p_k)$.
Verified: no
Solve time: 5m13s
Exercise 3.4.1.7 [26] (A. J. Walker)
Suppose we have $k$ colors of cubes, $n_j$ cubes of color $C_j$ $(1\le j\le k)$, and $k$ boxes ${B_1, \dots, B_k}$ each of capacity $n$, with $n_1 + \cdots + n_k = kn$. Constructively prove that it is possible to place the cubes into the boxes so that each box contains at most two colors, and if a box $B_j$ contains two colors, one of them is $C_j$. Then show how this principle produces the $P$ and $Y$ tables in Walker's alias method.
Step 1: Reformulate the problem
We wish to assign the $n_j$ cubes of color $C_j$ to the $k$ boxes so that:
- Each box contains exactly $n$ cubes.
- Each box contains at most two colors.
- If box $B_j$ contains two colors, one of them is $C_j$.
Equivalently, we must partition each color into at most two parts: one part stays in its designated box $B_j$, the other part is assigned to another box that has room.
Step 2: Constructive packing algorithm
Define the surplus of color $C_j$ relative to its designated box $B_j$:
$$ s_j = n_j - n. $$
- If $s_j \ge 0$, then $B_j$ contains $n$ cubes of $C_j$ plus $s_j$ surplus cubes that must be placed elsewhere.
- If $s_j < 0$, then $B_j$ has a deficit of $-s_j$ cubes.
The key idea is to process colors in order and assign surplus cubes to deficit boxes in a way that guarantees at most two colors per box.
Step 3: Two-color packing procedure
- Initialize: For each $j$, put $\min(n_j,n)$ cubes of $C_j$ into $B_j$. Then $B_j$ has either exactly $n$ cubes or a deficit of $n - n_j$ cubes.
- Sort deficits: Let the boxes with deficits be $B_{j_1}, \dots, B_{j_m}$.
- Assign surplus cubes in order: Process the colors with surplus cubes one by one. Suppose $C_r$ has $s_r > 0$ surplus cubes. Assign them to the deficit boxes in any order, filling each deficit box completely before moving to the next.
- Guarantee two-color property: Whenever a deficit box $B_s$ receives cubes from a surplus color $C_r$, it is initially filled with cubes of $C_s$ only. We assign exactly $n - n_s$ cubes of $C_r$ to $B_s$, filling it completely. Therefore $B_s$ ends up with cubes of only $C_s$ and $C_r$, and cannot receive any other color because it is now full.
- Repeat until all surplus cubes are placed.
Step 4: Verification
Capacity: Each box receives exactly $n$ cubes.
Two-color property:
- Box $B_j$ initially contains cubes of its own color $C_j$.
- If it has a deficit, it receives cubes from a single donor color to fill the deficit. No box can receive cubes from more than one donor because we always fill the deficit completely in one step.
- Therefore each box contains either one color or exactly two colors, one of which is $C_j$.
Existence: The procedure terminates because the total number of surplus cubes equals the total deficit:
$$ \sum_{j=1}^{k} s_j = \sum_{j=1}^{k} (n_j - n) = kn - kn = 0. $$
Hence every surplus cube is assigned to some deficit box, producing a valid packing.
Step 5: Construction of $P$ and $Y$ tables
Suppose we have a probability distribution $(p_1,\dots,p_k)$ with $p_j = n_j/(kn)$. Let $n$ be chosen such that all $n_j = kn p_j$ are integers. Apply the packing procedure above to the boxes.
For each box $B_j$:
- If $B_j$ contains only color $C_j$, set
$$ P_{j-1} = 1, \quad Y_{j-1} = \text{arbitrary}. $$
- If $B_j$ contains $m$ cubes of color $C_j$ and $n-m$ cubes of color $C_t$, set
$$ P_{j-1} = \frac{m}{n}, \quad Y_{j-1} = t-1. $$
Step 6: Correctness of the alias method
Let $U,V$ be independent uniform deviates, and let
$$ K = \lfloor kU \rfloor. $$
Define the output
$$ X = \begin{cases} K, & V < P_K,\ Y_K, & V \ge P_K. \end{cases} $$
Analysis:
- Conditioned on $K=j-1$, $X$ selects uniformly among the $n$ cubes in box $B_j$.
- Each box is selected with probability $1/k$, so each cube position has probability $1/(kn)$.
- Since color $C_j$ occupies exactly $n_j$ cube positions, the probability of selecting $C_j$ is
$$ \Pr{C_j} = \frac{n_j}{kn} = p_j. $$
Hence the tables $P$ and $Y$ implement the desired probability distribution exactly.
Step 7: Summary
- The constructive algorithm places cubes into boxes such that each box contains at most two colors, and if a box contains two colors, one is the designated color.
- The $P$ and $Y$ tables follow directly from the two-color packing, producing a correct alias method for the given probability distribution.
∎
This construction avoids the flaw in the previous solution: by filling each deficit box completely in a single assignment from a single donor color, we guarantee that no box receives cubes from more than one surplus color. This preserves the two-color property required by the exercise.