TAOCP 3.3.2 Exercise 5

Let p=\beta-\alpha .

Section 3.3.2: Empirical Tests

Exercise 5. [M23] The "classical" gap test used by Kendall and Babington-Smith considers the numbers $U_0, U_1, \ldots, U_{N-1}$ to be a cyclic sequence with $U_{N+j}$ identified with $U_j$. Here $N$ is a fixed number of $U$'s that are to be subjected to the test. If $n$ of the numbers $U_0, \ldots, U_{N-1}$ fall into the range $\alpha \le U_j < \beta$, there are $n$ gaps in the cyclic sequence. Let $Z_r$ be the number of gaps of length $r$, for $0 \le r < t$, and let $Z_t$ be the number of gaps of length $\ge t$; show that the quantity $V = \sum_{0 \le r \le t}(Z_r - np_r)^2/np_r$ should have the chi-square distribution with $t$ degrees of freedom, in the limit as $N$ goes to infinity, where $p_r$ is given in Eq. (4).

Verified: no
Solve time: 5m22s


Solution

Let

$$ p=\beta-\alpha . $$

Each $U_j$ is uniformly distributed on $[0,1)$, hence

$$ \Pr(\alpha\le U_j<\beta)=p, \qquad \Pr(U_j\notin[\alpha,\beta))=1-p. $$

By Eq. (4), the gap-length probabilities are

$$ p_r=p(1-p)^r,\qquad 0\le r<t, $$

and

$$ p_t=(1-p)^t. $$

The category $r=t$ represents all gaps of length at least $t$.

Let $n$ be the number of values among $U_0,\ldots,U_{N-1}$ that fall in $[\alpha,\beta)$. Since

$$ n=\sum_{j=0}^{N-1} I_j, $$

where $I_j$ is the indicator of the event $\alpha\le U_j<\beta$, the law of large numbers gives

$$ \frac nN \to p $$

with probability $1$ as $N\to\infty$. In particular,

$$ n\to\infty $$

with probability $1$.

Now condition on the event that $n$ successes occur. Because the $U_j$ are independent, the successive gap lengths between occurrences of values in $[\alpha,\beta)$ have the geometric distribution

$$ \Pr(\text{gap length}=r)=p(1-p)^r, \qquad r\ge0. $$

The cyclic convention merely removes the initial and terminal edge effects. Given $n$, there are exactly $n$ gaps, and for large $N$ the dependence introduced by the condition that the cyclic gaps sum to $N-n$ contributes only a negligible $O(N^{-1})$ effect. Thus the vector

$$ (Z_0,\ldots,Z_t) $$

has the same asymptotic distribution as the counts obtained by classifying $n$ independent gap lengths into the $t+1$ categories

$$ 0,1,\ldots,t-1,\ge t. $$

Hence, conditional on $n$,

$$ (Z_0,\ldots,Z_t) $$

is asymptotically multinomial with parameters

$$ n;\ p_0,p_1,\ldots,p_t. $$

Therefore

$$ E(Z_r\mid n)=np_r, $$

and

$$ \operatorname{Cov}(Z_r,Z_s\mid n) = \begin{cases} np_r(1-p_r), & r=s,\[1ex] -np_rp_s, & r\ne s. \end{cases} $$

The classical multinomial chi-square theorem now applies. For a multinomial vector with $k=t+1$ categories,

$$ V = \sum_{r=0}^{t} \frac{(Z_r-np_r)^2}{np_r} $$

converges in distribution, as $n\to\infty$, to a chi-square law with

$$ k-1=(t+1)-1=t $$

degrees of freedom:

$$ V \xrightarrow{d} \chi_t^2 . $$

Since $n/N\to p>0$, we have $n\to\infty$ whenever $N\to\infty$. Consequently the same limit holds when the original parameter $N$ tends to infinity:

$$ V \xrightarrow{d} \chi_t^2 , \qquad N\to\infty . $$

Thus the statistic

$$ V=\sum_{0\le r\le t}\frac{(Z_r-np_r)^2}{np_r} $$

has the chi-square distribution with $t$ degrees of freedom in the limit as $N\to\infty$. ∎