TAOCP 3.3.1 Exercise 9
**Solution (corrected)** We are asked to discuss the merits of pooling the 20 values of $K_{10}^{+}$ with the 20 values of $K_{10}^{-}$ and then applying a Kolmogorov-Smirnov test to the resulting 40...
Section 3.3.1: General Test Procedures for Studying Random Data
Exercise 9. ▶ [**] [28] The experiment described in the text consisted of plotting 20 values of $K_{10}^+$, computed from the maximum-of-5 test applied to different parts of a random sequence. We could have computed also the corresponding 20 values of $K_{10}^-$; since $K_{10}^-$ has the same distribution as $K_{10}^+$, we could lump together the 40 values thus obtained (that is, 20 of the $K_{10}^+$'s and 20 of the $K_{10}^-$'s), and a KS test could be applied so that we would get new values $K_{40}^+$, $K_{40}^-$. Discuss the merits of this idea.
Verified: yes
Solve time: 4m25s
Solution (corrected)
We are asked to discuss the merits of pooling the 20 values of $K_{10}^{+}$ with the 20 values of $K_{10}^{-}$ and then applying a Kolmogorov-Smirnov test to the resulting 40 observations. Let us examine this carefully.
Step 1: Definition and context
For a block of 5 consecutive observations from a uniform $[0,1]$ sequence, let
$$ K_{10}^{+} = \max_{1 \le j \le 5} \left(F(X_j) - \frac{j-1}{5}\right), \quad K_{10}^{-} = \max_{1 \le j \le 5} \left(\frac{j}{5} - F(X_j)\right), $$
as in Section 3.3.1. The 20 values of $K_{10}^{+}$ arise from 20 disjoint blocks of the random sequence, and the 20 corresponding $K_{10}^{-}$ values come from the same blocks.
By construction, $K_{10}^{-}$ has the same marginal distribution as $K_{10}^{+}$, but $K_{10}^{+}$ and $K_{10}^{-}$ computed from the same block are generally dependent, because they are functions of the same sample.
Step 2: Dependence analysis
Let us consider a single block. If $K_{10}^{+}$ is large, this indicates that at least one observation in the block exceeds its expected order statistic substantially. Because $K_{10}^{-}$ measures shortfalls relative to the expected order statistic, a large positive deviation in one direction may constrain or affect the possible deviation in the opposite direction. Consequently, $K_{10}^{+}$ and $K_{10}^{-}$ computed from the same block are negatively correlated, not independent.
Since the 20 pairs $(K_{10}^{+}, K_{10}^{-})$ are drawn from 20 disjoint blocks, the pairs are independent across blocks. Therefore, we have 20 independent bivariate observations, but within each pair, the components are dependent.
Step 3: Implications for the KS test
The Kolmogorov-Smirnov test assumes that the sample consists of independent observations from the hypothesized distribution. If we pool the 20 $K_{10}^{+}$ values with the 20 $K_{10}^{-}$ values into a single sample of size 40 and treat them as independent, we violate this assumption because each $K_{10}^{+}$ is dependent on the corresponding $K_{10}^{-}$.
Consequences:
- The empirical distribution function (EDF) of the pooled sample does not have the same null distribution as the standard KS statistic for 40 independent observations.
- Any $K_{40}^{+}$ or $K_{40}^{-}$ computed from the pooled sample cannot be compared directly to the standard KS critical values without adjustment.
- Claims that the pooled sample increases precision or sensitivity are not automatically justified, because the dependence reduces the effective sample size relative to 40 independent observations.
Thus, treating the pooled sample as 40 i.i.d. draws from the $K_{10}^{+}$ distribution is invalid.
Step 4: Possible merits of the idea
Despite the dependence, there are some considerations:
- All 40 observations come from the same marginal distribution. The EDF of the pooled sample may still provide a rough visual comparison to the theoretical distribution.
- If one were to model the dependence structure explicitly, or adjust the null distribution to account for the pairing, it could be possible to extract additional information from the 20 pairs.
However, without such adjustments, the standard KS test cannot be applied directly, so the method is not strictly valid.
Step 5: Conclusion
Pooling the 20 values of $K_{10}^{+}$ and the 20 values of $K_{10}^{-}$ without accounting for the dependence between paired observations is not justified for a KS test. The main merit of the idea is that it may provide a visual or exploratory check of the marginal distribution. The main drawback is that it violates the independence assumption, rendering the standard KS critical values invalid. Any apparent increase in power or precision is illusory unless the dependence is properly addressed.
Answer (summary)
- $K_{10}^{+}$ and $K_{10}^{-}$ from the same block are dependent; dependence is generally negative.
- Pooling yields 40 observations with non-negligible pairwise dependence.
- Standard KS test assumes independence; hence $K_{40}^{+}$ and $K_{40}^{-}$ cannot be compared to standard critical values.
- Merit: marginal distribution is the same, so pooled EDF provides a visual or heuristic check.
- Drawback: violates KS assumptions; apparent gain in sample size is misleading.
The proposal is therefore not valid for rigorous KS testing, although it may have limited heuristic value.