TAOCP 3.3.1 Exercise 19
Let X^{(1)},X^{(2)},\ldots,X^{(n)} be independent observations from an unknown distribution on $\mathbf R^s$, where
Section 3.3.1: General Test Procedures for Studying Random Data
Exercise 19. [**] [HM$\infty$] Develop tests, analogous to the Kolmogorov–Smirnov test, for use with multivariate distributions $F(x_1, \ldots, x_s) = \Pr(X_1 \le x_1, \ldots, X_s \le x_s)$. (Such procedures could be used, for example, in place of the "serial test" in the next section.)
Verified: yes
Solve time: 6m18s
Let
$$ X^{(1)},X^{(2)},\ldots,X^{(n)} $$
be independent observations from an unknown distribution on
$\mathbf R^s$, where
$$ X^{(i)}=(X^{(i)}_1,\ldots,X^{(i)}_s). $$
Suppose that under the null hypothesis the common distribution function is
$$ F(x_1,\ldots,x_s) =\Pr(X_1\le x_1,\ldots,X_s\le x_s), $$
and assume first that $F$ is continuous.
Define the multivariate empirical distribution function by
$$ F_n(x_1,\ldots,x_s) =\frac1n #\bigl{ i: X^{(i)}_1\le x_1,\ldots, X^{(i)}_s\le x_s \bigr}. $$
This is the direct analogue of the empirical distribution function in one dimension.
A natural analogue of the Kolmogorov-Smirnov statistic is
$$ D_n =\sup_{x_1,\ldots,x_s} \bigl| F_n(x_1,\ldots,x_s) - F(x_1,\ldots,x_s) \bigr|. $$
Large values of $D_n$ indicate that the observed sample distribution differs substantially from the hypothesized distribution $F$.
Equivalently, since $F_n$ changes only when one of the coordinates passes a sample point, the supremum need only be taken over rectangles whose upper-right corner is a sample point:
$$ D_n = \max_{1\le i_1,\ldots,i_s\le n} \left| F_n\bigl( X^{(i_1)}_1,\ldots,X^{(i_s)}_s \bigr) - F\bigl( X^{(i_1)}_1,\ldots,X^{(i_s)}_s \bigr) \right|. $$
In one dimension, the Kolmogorov-Smirnov statistic is distribution-free, because the transformation
$$ U_i=F(X_i) $$
reduces the problem to the uniform distribution on $[0,1]$.
For $s>1$, no analogous scalar transformation exists in general. Although one may transform each marginal separately,
$$ U^{(i)}_j=F_j(X^{(i)}_j), $$
where $F_j$ is the $j$-th marginal distribution, this determines only the marginals of the transformed vector. The dependence structure remains. Hence two distinct multivariate distributions can have identical marginals. Therefore coordinatewise one-dimensional Kolmogorov-Smirnov tests are not sufficient for testing the full joint distribution.
Consequently, unlike the one-dimensional case, the null distribution of $D_n$ generally depends on the full distribution $F$. Critical values must therefore be obtained either from asymptotic theory or by Monte Carlo simulation from the hypothesized distribution.
A practical procedure is therefore the following.
- Generate many independent samples
$$ X^{(1)},\ldots,X^{(n)} $$
from the hypothesized distribution $F$.
- For each simulated sample compute
$$ D_n^{} = \sup_x |F_n^{}(x)-F(x)|. $$
- Use the empirical distribution of the simulated values $D_n^{*}$ to determine critical points.
- Reject the hypothesis when the observed value of $D_n$ exceeds the chosen critical value.
This gives a genuine multivariate analogue of the Kolmogorov-Smirnov test.
One may also define one-sided analogues:
$$ D_n^{+} = \sup_x \bigl(F_n(x)-F(x)\bigr), \qquad D_n^{-} = \sup_x \bigl(F(x)-F_n(x)\bigr), $$
so that
$$ D_n=\max(D_n^{+},D_n^{-}). $$
The test is consistent. Indeed, if the true distribution is $G\neq F$, then there exists a point $x$ such that
$$ G(x)\neq F(x). $$
By the multivariate Glivenko-Cantelli theorem,
$$ \sup_x |F_n(x)-G(x)|\to0 \qquad\text{almost surely}. $$
Hence
$$ D_n \to \sup_x |G(x)-F(x)|
0 \qquad\text{almost surely}. $$
Therefore the probability of rejection tends to $1$ as $n\to\infty$.
Thus the multivariate empirical distribution function together with the supremum statistic
$$ D_n=\sup_x |F_n(x)-F(x)| $$
provides the natural extension of the Kolmogorov-Smirnov procedure to distributions on $\mathbf R^s$.