TAOCP 3.3.1 Exercise 11

Let F_n(x)=\frac{1}{n}\#\{j:X_j\le x\} be the empirical distribution function of the original sample

Section 3.3.1: General Test Procedures for Studying Random Data

Exercise 11. [**] [10] Solve exercise 10 substituting the KS test for the chi-square test.

Verified: yes
Solve time: 10m50s


Let

$$ F_n(x)=\frac{1}{n}#{j:X_j\le x} $$

be the empirical distribution function of the original sample

$$ X_1,X_2,\dots,X_n, $$

and let

$$ D_n=\sup_x |F_n(x)-F(x)| $$

be the corresponding Kolmogorov-Smirnov statistic, where $F(x)$ is the hypothesized distribution function.

Now repeat the test on the same $n$ observations again. Thus the combined sample of size $2n$ is

$$ Y_1,\dots,Y_{2n} = X_1,\dots,X_n,X_1,\dots,X_n. $$

The empirical distribution function based on these $2n$ observations is

$$ F_{2n}(x) = \frac{1}{2n}#{j:Y_j\le x}. $$

Since every observation $X_j$ appears twice in the combined sample,

$$ #{j:Y_j\le x} = 2#{j:X_j\le x}. $$

Hence

$$ F_{2n}(x) = \frac{2#{j:X_j\le x}}{2n} = \frac{1}{n}#{j:X_j\le x} = F_n(x). $$

Therefore the empirical distribution function is exactly the same after duplicating the data. It follows that the KS statistic is also unchanged:

$$ D_{2n} = \sup_x |F_{2n}(x)-F(x)| = \sup_x |F_n(x)-F(x)| = D_n. $$

Thus, when the same observations are repeated, the Kolmogorov-Smirnov statistic for the combined sample of size $2n$ is identical to the original statistic. ∎