TAOCP 3.4.1 Exercise 29

We want $X_1 \le X_2 \le \cdots \le X_n$ such that each $X_i$ lies in $[0,1]$ and the joint distribution is uniform over the simplex $0 \le X_1 \le X_2 \le \cdots \le X_n \le 1.$ Equivalently, we want...

Section 3.4.1: Numerical Distributions

Exercise 29. [M20] (J. L. Bentley and J. B. Saxe.) Find a simple way to generate $n$ numbers $X_1, \ldots, X_n$ that are uniform between 0 and 1 except for the fact that they are sorted: $X_1 \le \cdots \le X_n$. Your algorithm should take only $O(n)$ steps.

Verified: yes
Solve time: 1m31s


Solution

We want $X_1 \le X_2 \le \cdots \le X_n$ such that each $X_i$ lies in $[0,1]$ and the joint distribution is uniform over the simplex

$0 \le X_1 \le X_2 \le \cdots \le X_n \le 1.$

Equivalently, we want $(X_1, \dots, X_n)$ to be uniformly distributed in the $n$-dimensional region defined by

$0 \le X_1 \le X_2 \le \cdots \le X_n \le 1, \qquad \text{with volume } \frac{1}{n!}.$

Introduce auxiliary variables $Y_1, \dots, Y_n$ defined by

$Y_1 = X_1, \quad Y_2 = X_2 - X_1, \quad Y_3 = X_3 - X_2, \ \dots, \ Y_n = X_n - X_{n-1}, \quad Y_{n+1} = 1 - X_n.$

Each $Y_i \ge 0$ and $\sum_{i=1}^{n+1} Y_i = 1$. Thus, $(Y_1, \dots, Y_{n+1})$ lies uniformly on the standard $n$-simplex

$\Delta_n = {(y_1, \dots, y_{n+1}) : y_i \ge 0, \sum_{i=1}^{n+1} y_i = 1}.$

A well-known fact from probability (see Section 3.4.1, part B) is that if $Z_1, \dots, Z_{n+1}$ are independent exponential random variables with mean 1, then

$Y_i = \frac{Z_i}{Z_1 + \cdots + Z_{n+1}}$

produces a uniform point on the $n$-simplex. Since $Z_i = -\ln U_i$ for uniform $U_i \in (0,1)$ gives exponential deviates, we can set

$Z_i = -\ln U_i, \qquad Y_i = \frac{-\ln U_i}{\sum_{j=1}^{n+1} (-\ln U_j)}, \quad 1 \le i \le n+1.$

Finally, recover the sorted uniform deviates $X_i$ by

$X_1 = Y_1, \quad X_2 = Y_1 + Y_2, \quad \dots, \quad X_n = Y_1 + \cdots + Y_n.$

This produces $(X_1, \dots, X_n)$ uniform over the simplex $0 \le X_1 \le \cdots \le X_n \le 1$.

The algorithm requires $n+1$ independent uniform deviates $U_i$, $n+1$ logarithms, one sum of $n+1$ numbers, $n+1$ divisions, and $n$ cumulative sums. Each step is $O(1)$, so the total cost is $O(n)$. This satisfies the linear-time requirement.

We can summarize the method as follows:

Algorithm S (sorted uniform deviates).

S1. [Generate uniform variables.] Generate $U_1, \dots, U_{n+1}$ independently, uniformly in $(0,1)$.

S2. [Compute exponentials.] Set $Z_i \leftarrow -\ln U_i$ for $i = 1, \dots, n+1$.

S3. [Normalize.] Set $S \leftarrow Z_1 + \cdots + Z_{n+1}$. Then set $Y_i \leftarrow Z_i / S$ for $i = 1, \dots, n+1$.

S4. [Cumulative sum.] Set $X_1 \leftarrow Y_1$, $X_2 \leftarrow Y_1 + Y_2$, $\dots$, $X_n \leftarrow Y_1 + \cdots + Y_n$.

The sequence $X_1 \le \cdots \le X_n$ is uniformly distributed as required. $\blacksquare$