TAOCP 3.3.2 Exercise 3

Section 3.3.2: Empirical Tests

Exercise 3. ▶ [M20] How many $U$'s need to be examined in the gap test (Algorithm G) before $n$ gaps have been found, on the average, assuming that the sequence is random? What is the standard deviation of this quantity?

Verified: yes
Solve time: 1m55s

Solution

Let $N$ be the total number of values $U_j$ examined by Algorithm G when $n$ gaps have been recorded.

A gap of length $r$ consists of $r$ consecutive failures, followed by one success, where a success means that $\alpha \le U_j < \beta$. Since

$p=\beta-\alpha,$

each observation is a success with probability $p$, independently of all others.

Algorithm G records one gap whenever a success occurs. Therefore the algorithm terminates as soon as the $n$th success has been observed. Hence $N$ is the number of Bernoulli trials required to obtain $n$ successes.

Equivalently, if the recorded gap lengths are $R_1,R_2,\ldots,R_n$, then

$N=\sum_{i=1}^{n}(R_i+1).$

For a single gap,

$$$$

by (4). Thus $R_i$ has the geometric distribution counting failures before the first success. Its mean is

$$ =\sum_{r\ge0} r,p(1-p)^r =\frac{1-p}{p}, $$

and its variance is

$$ =\frac{1-p}{p^2}. $$

Since the gaps are independent,

$$ =\sum_{i=1}^{n}E(R_i+1) =n!\left(\frac{1-p}{p}+1\right) =\frac{n}{p}. $$

Therefore the average number of $U$'s examined is

$$$$

For the variance,

$$ =\sum_{i=1}^{n}\operatorname{Var}(R_i+1) =n,\frac{1-p}{p^2}, $$

because adding $1$ does not change the variance. Hence the standard deviation is

$$ =\frac{\sqrt{n(1-p)}}{p}. $$

Thus the required quantities are

$$ \boxed{\sigma_N=\frac{\sqrt{n(1-p)}}{p}}, $$

where $p=\beta-\alpha$. ∎