TAOCP 3.3.2 Exercise 4

Section 3.3.2: Empirical Tests

Exercise 4. [M12] Prove that the probabilities in (4) are correct for the gap test.

Verified: yes
Solve time: 4m39s

Solution

Let

$$ p=\beta-\alpha . $$

For a random sequence, each value $U_j$ is independently distributed on $[0,1)$, hence

$$ \Pr(\alpha \le U_j < \beta)=p, \qquad \Pr(U_j<\alpha \text{ or } U_j\ge\beta)=1-p. $$

A gap of length $r$ is defined to be a subsequence

$$ U_j,U_{j+1},\ldots,U_{j+r} $$

such that

$$ U_j,\ldots,U_{j+r-1}\notin[\alpha,\beta), \qquad U_{j+r}\in[\alpha,\beta). $$

Therefore a gap of length $r$ occurs precisely when the first $r$ observations fail to lie in the interval $[\alpha,\beta)$ and the next observation lies in that interval.

Since the observations are independent, the probability of this event is the product of the corresponding probabilities:

$$ p_r = (1-p)^r,p, \qquad r\ge 0. $$

For the categories used by Algorithm G, the lengths $0,1,\ldots,t-1$ are recorded separately. Hence, for $0\le r<t$,

$$ p_r=p(1-p)^r. $$

The final category combines all gaps of length at least $t$. Its probability is

$$ p_t = \sum_{r=t}^{\infty} p(1-p)^r = p(1-p)^t\sum_{k=0}^{\infty}(1-p)^k. $$

Since

$$ \sum_{k=0}^{\infty}(1-p)^k = \frac1p, $$

we obtain

$$ p_t=(1-p)^t. $$

Thus the probabilities used in the gap test are

$$ p_r=p(1-p)^r, \qquad 0\le r\le t-1, $$

and

$$ p_t=(1-p)^t, $$

which are exactly the probabilities given in equation (4).

This completes the proof.

∎