TAOCP 3.3.1 Exercise 14

Let $Y_s=np_s+\sqrt n\,Z_s,\qquad s=1,\ldots,k,$ where the variables $Z_s$ are bounded as $n\to\infty$.

Section 3.3.1: General Test Procedures for Studying Random Data

Exercise 14. ▶ [**] [HM36] Let $Z_k$ be given by Eq. (18). Show directly by using Stirling's approximation that the multinomial probability

$$n! p_1^{k_1} \cdots p_k^{k_k} / Y_1! \cdots Y_k! = e^{-V/2} / \sqrt{(2\pi)^{k-1} p_1 \cdots p_k} + O(n^{-k/2}),$$

if $Z_1, Z_2, \ldots, Z_k$ are bounded as $n \to \infty$. (This idea leads to a proof of the chi-square test that is much closer to "first principles," and requires less handwaving, than the derivation in the text.)

Verified: yes
Solve time: 3m11s


Let

$Y_s=np_s+\sqrt n,Z_s,\qquad s=1,\ldots,k,$

where the variables $Z_s$ are bounded as $n\to\infty$. Since $\sum Y_s=n$ and $\sum p_s=1$, we have $\sum Z_s=0$. Apply Stirling's approximation

$m!=\sqrt{2\pi m},\Bigl(\frac me\Bigr)^m\Bigl(1+O(m^{-1})\Bigr)$

to $n!$ and each $Y_s!$. Then

=\frac{\sqrt{2\pi n}}{\prod_s\sqrt{2\pi Y_s}} \frac{n^n\prod_s p_s^{Y_s}}{\prod_s Y_s^{Y_s}} \Bigl(1+O(n^{-1})\Bigr).$$ Since $Y_s=np_s+\sqrt n,Z_s=np_s(1+\delta_s)$ with $\delta_s=Z_s/(\sqrt n,p_s)=O(n^{-1/2})$, $$\log\frac{n^n\prod_s p_s^{Y_s}}{\prod_s Y_s^{Y_s}} =-\sum_s Y_s\log(1+\delta_s).

Using $\log(1+\delta)=\delta-\delta^2/2+O(\delta^3)$ and $\sum_s Z_s=0$,

$$ =-\frac12\sum_s\frac{Z_s^2}{p_s}+O(n^{-1/2}). $$

By Eq. (18), and equivalently by substituting $Y_s=np_s+\sqrt n,Z_s$ into (6),

$$ =\sum_s\frac{Z_s^2}{p_s}, $$

hence

$$ =e^{-V/2}\Bigl(1+O(n^{-1/2})\Bigr). $$

Also,

$$ =(2\pi)^{-(k-1)/2} \Bigl(n^{1-k}\prod_s p_s^{-1}\Bigr)^{1/2} \Bigl(1+O(n^{-1/2})\Bigr), $$

because $Y_s=np_s+O(\sqrt n)$. Multiplying the last two estimates gives

$$ =\frac{e^{-V/2}}{\sqrt{(2\pi n)^{,k-1}p_1\cdots p_k}} \Bigl(1+O(n^{-1/2})\Bigr). $$

Since the principal term is of order $n^{-(k-1)/2}$, the error term is of order $n^{-k/2}$. Therefore

$$ =\frac{e^{-V/2}}{\sqrt{(2\pi n)^{,k-1}p_1\cdots p_k}} +O(n^{-k/2}), $$

which is the required asymptotic formula. This completes the proof.