TAOCP 3.3.1 Exercise 7

Let $F_n(x)$ be the empirical distribution function based on $n$ independent observations $X_1, \dots, X_n$ from a **continuous** distribution $F(x)$.

Section 3.3.1: General Test Procedures for Studying Random Data

Exercise 7. [M16] Prove that $K_n^+$ and $K_n^-$ can never be negative. What is the largest possible value $K_n^-$ can have?

Verified: no
Solve time: 6m48s


Solution

Let $F_n(x)$ be the empirical distribution function based on $n$ independent observations $X_1, \dots, X_n$ from a continuous distribution $F(x)$. Recall the definitions of the Kolmogorov statistics:

$$ K_n^+ = \sup_x \bigl(F_n(x) - F(x)\bigr), \qquad K_n^- = \sup_x \bigl(F(x) - F_n(x)\bigr). $$

Step 1: Nonnegativity of $K_n^+$ and $K_n^-$

By definition, $F_n(x)$ and $F(x)$ are nondecreasing functions with values in $[0,1]$. Both satisfy

$$ \lim_{x \to -\infty} F_n(x) = 0, \qquad \lim_{x \to -\infty} F(x) = 0. $$

Consider $K_n^+ = \sup_x (F_n(x) - F(x))$. For any fixed $x$,

$$ F_n(x) - F(x) \in [-1,1]. $$

In particular, as $x \to -\infty$, we have

$$ F_n(x) - F(x) = 0 - 0 = 0. $$

Since the supremum of a set of real numbers is always at least as large as any element of the set, we conclude

$$ K_n^+ = \sup_x \bigl(F_n(x) - F(x)\bigr) \ge 0. $$

Similarly, for $K_n^- = \sup_x (F(x) - F_n(x))$, we have

$$ F(x) - F_n(x) = 0 - 0 = 0 \quad \text{as } x \to -\infty, $$

hence

$$ K_n^- = \sup_x \bigl(F(x) - F_n(x)\bigr) \ge 0. $$

Therefore, both $K_n^+$ and $K_n^-$ are never negative.

Step 2: Largest possible value of $K_n^-$

By definition,

$$ K_n^- = \sup_x \bigl(F(x) - F_n(x)\bigr). $$

Let $X_{(1)} = \min{X_1, \dots, X_n}$ denote the smallest observation. For all $x < X_{(1)}$,

$$ F_n(x) = 0, $$

so

$$ F(x) - F_n(x) = F(x) - 0 = F(x). $$

Since $F$ is continuous, we have

$$ \sup_{x < X_{(1)}} F(x) = F(X_{(1)}). $$

Thus, the Kolmogorov statistic can be expressed as

$$ K_n^- = \sup_x \bigl(F(x) - F_n(x)\bigr) \ge F(X_{(1)}). $$

Because $F$ is continuous and can take any value in $(0,1)$, we can choose all $n$ observations far into the right tail of the distribution. Then $X_{(1)}$ is arbitrarily large in terms of $F$-quantiles, and

$$ F(X_{(1)}) \approx 1. $$

Hence, for any $\epsilon > 0$, there exists a sample such that

$$ K_n^- > 1 - \epsilon. $$

However, since $F$ is continuous, no finite sample can achieve $F(X_{(1)}) = 1$, so the supremum of possible values of $K_n^-$ is

$$ \sup K_n^- = 1, $$

but the maximum is not attained.

Step 3: Conclusion

  1. Nonnegativity

$$ K_n^+ \ge 0, \qquad K_n^- \ge 0. $$

  1. Largest possible value of $K_n^-$

$$ \sup K_n^- = 1, $$

which can be approached arbitrarily closely but not exactly attained for a finite sample.

Answer (boxed)

$$ \begin{aligned} &K_n^+ \ge 0, \quad K_n^- \ge 0,\ &\sup K_n^- = 1. \end{aligned} $$

This corrects the earlier error: the previously claimed maximum $\frac{n-1}{n}$ is incorrect. The supremum of $K_n^-$ is 1, not attained, and the argument uses the continuity of $F$ rigorously.