TAOCP 3.3.2 Exercise 5
Let p=\beta-\alpha .
Section 3.3.2: Empirical Tests
Exercise 5. [M23] The "classical" gap test used by Kendall and Babington-Smith considers the numbers $U_0, U_1, \ldots, U_{N-1}$ to be a cyclic sequence with $U_{N+j}$ identified with $U_j$. Here $N$ is a fixed number of $U$'s that are to be subjected to the test. If $n$ of the numbers $U_0, \ldots, U_{N-1}$ fall into the range $\alpha \le U_j < \beta$, there are $n$ gaps in the cyclic sequence. Let $Z_r$ be the number of gaps of length $r$, for $0 \le r < t$, and let $Z_t$ be the number of gaps of length $\ge t$; show that the quantity $V = \sum_{0 \le r \le t}(Z_r - np_r)^2/np_r$ should have the chi-square distribution with $t$ degrees of freedom, in the limit as $N$ goes to infinity, where $p_r$ is given in Eq. (4).
Verified: no
Solve time: 5m22s
Solution
Let
$$ p=\beta-\alpha . $$
Each $U_j$ is uniformly distributed on $[0,1)$, hence
$$ \Pr(\alpha\le U_j<\beta)=p, \qquad \Pr(U_j\notin[\alpha,\beta))=1-p. $$
By Eq. (4), the gap-length probabilities are
$$ p_r=p(1-p)^r,\qquad 0\le r<t, $$
and
$$ p_t=(1-p)^t. $$
The category $r=t$ represents all gaps of length at least $t$.
Let $n$ be the number of values among $U_0,\ldots,U_{N-1}$ that fall in $[\alpha,\beta)$. Since
$$ n=\sum_{j=0}^{N-1} I_j, $$
where $I_j$ is the indicator of the event $\alpha\le U_j<\beta$, the law of large numbers gives
$$ \frac nN \to p $$
with probability $1$ as $N\to\infty$. In particular,
$$ n\to\infty $$
with probability $1$.
Now condition on the event that $n$ successes occur. Because the $U_j$ are independent, the successive gap lengths between occurrences of values in $[\alpha,\beta)$ have the geometric distribution
$$ \Pr(\text{gap length}=r)=p(1-p)^r, \qquad r\ge0. $$
The cyclic convention merely removes the initial and terminal edge effects. Given $n$, there are exactly $n$ gaps, and for large $N$ the dependence introduced by the condition that the cyclic gaps sum to $N-n$ contributes only a negligible $O(N^{-1})$ effect. Thus the vector
$$ (Z_0,\ldots,Z_t) $$
has the same asymptotic distribution as the counts obtained by classifying $n$ independent gap lengths into the $t+1$ categories
$$ 0,1,\ldots,t-1,\ge t. $$
Hence, conditional on $n$,
$$ (Z_0,\ldots,Z_t) $$
is asymptotically multinomial with parameters
$$ n;\ p_0,p_1,\ldots,p_t. $$
Therefore
$$ E(Z_r\mid n)=np_r, $$
and
$$ \operatorname{Cov}(Z_r,Z_s\mid n) = \begin{cases} np_r(1-p_r), & r=s,\[1ex] -np_rp_s, & r\ne s. \end{cases} $$
The classical multinomial chi-square theorem now applies. For a multinomial vector with $k=t+1$ categories,
$$ V = \sum_{r=0}^{t} \frac{(Z_r-np_r)^2}{np_r} $$
converges in distribution, as $n\to\infty$, to a chi-square law with
$$ k-1=(t+1)-1=t $$
degrees of freedom:
$$ V \xrightarrow{d} \chi_t^2 . $$
Since $n/N\to p>0$, we have $n\to\infty$ whenever $N\to\infty$. Consequently the same limit holds when the original parameter $N$ tends to infinity:
$$ V \xrightarrow{d} \chi_t^2 , \qquad N\to\infty . $$
Thus the statistic
$$ V=\sum_{0\le r\le t}\frac{(Z_r-np_r)^2}{np_r} $$
has the chi-square distribution with $t$ degrees of freedom in the limit as $N\to\infty$. ∎