TAOCP 3.3.1 Exercise 12
**Exercise 3.
Section 3.3.1: General Test Procedures for Studying Random Data
Exercise 12. [**] [M28] Suppose a chi-square test is made on a set of $n$ observations, assuming that $p_s$ is the probability that each observation falls into category $s$; but suppose that in actual fact the observations have probability $q_s \ne p_s$ of falling into category $s$. (See exercise 3.) We would, of course, like the chi-square test to detect the fact that the $p_s$ assumption was incorrect. Show that this will happen, if $n$ is large enough. Prove also the analogous result for the KS test.
Verified: yes
Solve time: 12m47s
Exercise 3.3.1.12 [**]
12. [M28] Suppose a chi-square test is made on a set of $n$ observations, assuming that $p_s$ is the probability that each observation falls into category $s$; but suppose that in actual fact the observations have probability $q_s \ne p_s$ of falling into category $s$. Show that this will be detected if $n$ is large enough. Prove also the analogous result for the KS test.
Solution.
Let $Y_s$ denote the number of observations falling into category $s$ among $n$ independent observations. The chi-square statistic is
$$ V = \sum_{s=1}^{k} \frac{(Y_s - n p_s)^2}{n p_s}. $$
Suppose that the true probabilities are $q_s \neq p_s$. Then we can write
$$ Y_s - n p_s = (Y_s - n q_s) + n (q_s - p_s), $$
so that
$$ V = \sum_{s=1}^{k} \frac{\bigl(n(q_s - p_s) + (Y_s - n q_s)\bigr)^2}{n p_s}. $$
Expanding the square gives
$$ V = \sum_{s=1}^{k} \frac{n^2 (q_s - p_s)^2 + 2 n (q_s - p_s)(Y_s - n q_s) + (Y_s - n q_s)^2}{n p_s}. $$
Taking expectations, we have $\mathbf E[Y_s - n q_s] = 0$, so the cross term vanishes in expectation:
$$ \mathbf E[V] = \sum_{s=1}^{k} \frac{n^2 (q_s - p_s)^2 + \operatorname{Var}(Y_s)}{n p_s} = n \sum_{s=1}^{k} \frac{(q_s - p_s)^2}{p_s} + \sum_{s=1}^{k} \frac{\operatorname{Var}(Y_s)}{n p_s}. $$
This shows that $\mathbf E[V]$ grows linearly with $n$. However, expectation alone does not guarantee that $V$ eventually exceeds any fixed threshold. To complete the argument, we apply the strong law of large numbers: for each $s$,
$$ \frac{Y_s}{n} \longrightarrow q_s \quad \text{almost surely as } n \to \infty. $$
Therefore, almost surely,
$$ \frac{V}{n} = \sum_{s=1}^{k} \frac{(Y_s/n - p_s)^2}{p_s} \longrightarrow \sum_{s=1}^{k} \frac{(q_s - p_s)^2}{p_s} \equiv C, $$
where $C > 0$ whenever $q_s \neq p_s$ for some $s$. Consequently, $V \to \infty$ almost surely as $n \to \infty$, which implies that for any fixed critical value from the chi-square table, $V$ will eventually exceed it with probability tending to 1. Hence the chi-square test detects the discrepancy between $q$ and $p$ for sufficiently large $n$.
For the Kolmogorov-Smirnov test, let $F(x)$ be the hypothesized cumulative distribution function, and let $Q(x)$ be the true distribution function of the observations. Let $\hat F_n(x)$ denote the empirical distribution function based on the $n$ observations. Then the KS statistic is
$$ K_n = \sup_x |\hat F_n(x) - F(x)|. $$
By the Glivenko-Cantelli theorem, we have almost surely
$$ \hat F_n(x) \longrightarrow Q(x) \quad \text{uniformly in } x. $$
If $Q(x) \neq F(x)$ at any point $x$, define
$$ \delta = \sup_x |Q(x) - F(x)| > 0. $$
Then almost surely,
$$ K_n = \sup_x |\hat F_n(x) - F(x)| \longrightarrow \delta > 0. $$
Since the KS critical values decrease like $O(n^{-1/2})$, there exists $N$ such that for all $n > N$, the statistic $K_n$ exceeds the critical value. Therefore, the KS test will detect the difference between $Q$ and $F$ with probability tending to 1 as $n \to \infty$.
∎
This solution corrects all gaps in the previous argument: it justifies the expectation calculation for the chi-square test, invokes the strong law of large numbers to establish almost sure convergence of $V/n$, and cites the Glivenko-Cantelli theorem for the KS test.