TAOCP 3.3.1 Exercise 19

Let X^{(1)},X^{(2)},\ldots,X^{(n)} be independent observations from an unknown distribution on $\mathbf R^s$, where

Section 3.3.1: General Test Procedures for Studying Random Data

Exercise 19. [**] [HM$\infty$] Develop tests, analogous to the Kolmogorov–Smirnov test, for use with multivariate distributions $F(x_1, \ldots, x_s) = \Pr(X_1 \le x_1, \ldots, X_s \le x_s)$. (Such procedures could be used, for example, in place of the "serial test" in the next section.)

Verified: yes
Solve time: 6m18s


Let

$$ X^{(1)},X^{(2)},\ldots,X^{(n)} $$

be independent observations from an unknown distribution on

$\mathbf R^s$, where

$$ X^{(i)}=(X^{(i)}_1,\ldots,X^{(i)}_s). $$

Suppose that under the null hypothesis the common distribution function is

$$ F(x_1,\ldots,x_s) =\Pr(X_1\le x_1,\ldots,X_s\le x_s), $$

and assume first that $F$ is continuous.

Define the multivariate empirical distribution function by

$$ F_n(x_1,\ldots,x_s) =\frac1n #\bigl{ i: X^{(i)}_1\le x_1,\ldots, X^{(i)}_s\le x_s \bigr}. $$

This is the direct analogue of the empirical distribution function in one dimension.

A natural analogue of the Kolmogorov-Smirnov statistic is

$$ D_n =\sup_{x_1,\ldots,x_s} \bigl| F_n(x_1,\ldots,x_s) - F(x_1,\ldots,x_s) \bigr|. $$

Large values of $D_n$ indicate that the observed sample distribution differs substantially from the hypothesized distribution $F$.

Equivalently, since $F_n$ changes only when one of the coordinates passes a sample point, the supremum need only be taken over rectangles whose upper-right corner is a sample point:

$$ D_n = \max_{1\le i_1,\ldots,i_s\le n} \left| F_n\bigl( X^{(i_1)}_1,\ldots,X^{(i_s)}_s \bigr) - F\bigl( X^{(i_1)}_1,\ldots,X^{(i_s)}_s \bigr) \right|. $$

In one dimension, the Kolmogorov-Smirnov statistic is distribution-free, because the transformation

$$ U_i=F(X_i) $$

reduces the problem to the uniform distribution on $[0,1]$.

For $s>1$, no analogous scalar transformation exists in general. Although one may transform each marginal separately,

$$ U^{(i)}_j=F_j(X^{(i)}_j), $$

where $F_j$ is the $j$-th marginal distribution, this determines only the marginals of the transformed vector. The dependence structure remains. Hence two distinct multivariate distributions can have identical marginals. Therefore coordinatewise one-dimensional Kolmogorov-Smirnov tests are not sufficient for testing the full joint distribution.

Consequently, unlike the one-dimensional case, the null distribution of $D_n$ generally depends on the full distribution $F$. Critical values must therefore be obtained either from asymptotic theory or by Monte Carlo simulation from the hypothesized distribution.

A practical procedure is therefore the following.

  1. Generate many independent samples

$$ X^{(1)},\ldots,X^{(n)} $$

from the hypothesized distribution $F$.

  1. For each simulated sample compute

$$ D_n^{} = \sup_x |F_n^{}(x)-F(x)|. $$

  1. Use the empirical distribution of the simulated values $D_n^{*}$ to determine critical points.
  2. Reject the hypothesis when the observed value of $D_n$ exceeds the chosen critical value.

This gives a genuine multivariate analogue of the Kolmogorov-Smirnov test.

One may also define one-sided analogues:

$$ D_n^{+} = \sup_x \bigl(F_n(x)-F(x)\bigr), \qquad D_n^{-} = \sup_x \bigl(F(x)-F_n(x)\bigr), $$

so that

$$ D_n=\max(D_n^{+},D_n^{-}). $$

The test is consistent. Indeed, if the true distribution is $G\neq F$, then there exists a point $x$ such that

$$ G(x)\neq F(x). $$

By the multivariate Glivenko-Cantelli theorem,

$$ \sup_x |F_n(x)-G(x)|\to0 \qquad\text{almost surely}. $$

Hence

$$ D_n \to \sup_x |G(x)-F(x)|

0 \qquad\text{almost surely}. $$

Therefore the probability of rejection tends to $1$ as $n\to\infty$.

Thus the multivariate empirical distribution function together with the supremum statistic

$$ D_n=\sup_x |F_n(x)-F(x)| $$

provides the natural extension of the Kolmogorov-Smirnov procedure to distributions on $\mathbf R^s$.