TAOCP 3.3.2 Exercise 1

Section 3.3.2: Empirical Tests

Exercise 1. [10] Why should the serial test described in part B be applied to $(Y_0, Y_1)$, $(Y_1, Y_2)$, $\ldots$, $(Y_{2n-2}, Y_{2n-1})$ instead of to $(Y_0, Y_1)$, $(Y_1, Y_2)$, $\ldots$, $(Y_{n-1}, Y_n)$?

Verified: yes
Solve time: 4m05s

If the serial test is applied to the overlapping pairs $(Y_0, Y_1)$, $(Y_1, Y_2)$, $\ldots$, $(Y_{n-1}, Y_n)$, then each number $Y_j$ contributes to two successive pairs, except for the first and last numbers. Consequently, the observations are not independent, and the usual chi-square approximation for the distribution of the test statistic is invalid. The correlation between overlapping pairs can inflate or deflate the observed chi-square value relative to its expected distribution.

By contrast, using the nonoverlapping pairs $(Y_0, Y_1)$, $(Y_2, Y_3)$, $\ldots$, $(Y_{2n-2}, Y_{2n-1})$ ensures that each pair is independent, because no number $Y_j$ appears in more than one pair. Independence of the observations is a fundamental assumption of the chi-square test, so this arrangement allows the test statistic to follow the theoretical chi-square distribution with $k = d^2$ categories. This completes the proof.

∎