TAOCP 3.3.2 Exercise 3
Let $N$ be the total number of values $U_j$ examined by Algorithm G when $n$ gaps have been recorded.
Section 3.3.2: Empirical Tests
Exercise 3. ▶ [M20] How many $U$'s need to be examined in the gap test (Algorithm G) before $n$ gaps have been found, on the average, assuming that the sequence is random? What is the standard deviation of this quantity?
Verified: yes
Solve time: 1m55s
Solution
Let $N$ be the total number of values $U_j$ examined by Algorithm G when $n$ gaps have been recorded.
A gap of length $r$ consists of $r$ consecutive failures, followed by one success, where a success means that $\alpha \le U_j < \beta$. Since
$p=\beta-\alpha,$
each observation is a success with probability $p$, independently of all others.
Algorithm G records one gap whenever a success occurs. Therefore the algorithm terminates as soon as the $n$th success has been observed. Hence $N$ is the number of Bernoulli trials required to obtain $n$ successes.
Equivalently, if the recorded gap lengths are $R_1,R_2,\ldots,R_n$, then
$N=\sum_{i=1}^{n}(R_i+1).$
For a single gap,
$$$$
by (4). Thus $R_i$ has the geometric distribution counting failures before the first success. Its mean is
$$ =\sum_{r\ge0} r,p(1-p)^r =\frac{1-p}{p}, $$
and its variance is
$$ =\frac{1-p}{p^2}. $$
Since the gaps are independent,
$$ =\sum_{i=1}^{n}E(R_i+1) =n!\left(\frac{1-p}{p}+1\right) =\frac{n}{p}. $$
Therefore the average number of $U$'s examined is
$$$$
For the variance,
$$ =\sum_{i=1}^{n}\operatorname{Var}(R_i+1) =n,\frac{1-p}{p^2}, $$
because adding $1$ does not change the variance. Hence the standard deviation is
$$ =\frac{\sqrt{n(1-p)}}{p}. $$
Thus the required quantities are
$$ \boxed{\sigma_N=\frac{\sqrt{n(1-p)}}{p}}, $$
where $p=\beta-\alpha$. ∎