TAOCP 1.2.10 Exercise 9

Let $X[1],X[2],\ldots,X[n]$ be chosen independently and uniformly from a set of $M$ distinct elements.

Section 1.2.10: Analysis of an Algorithm

Exercise 9. [M25] Generalize the result of the preceding exercise to find a formula for the probability that exactly $m$ distinct values occur among the $X$'s. Express your answer in terms of Stirling numbers.

Verified: yes
Solve time: 1m30s


Solution

Let $X[1],X[2],\ldots,X[n]$ be chosen independently and uniformly from a set of $M$ distinct elements. We are asked to determine the probability that exactly $m$ distinct values appear among the $n$ elements. Denote this probability by $P_{n,M,m}$. We will derive a formula in terms of Stirling numbers of the second kind, $\left{{n \atop m}\right}$, which count the number of ways to partition a set of $n$ labeled objects into exactly $m$ nonempty unlabeled subsets.

The problem reduces to counting the number of sequences of length $n$ over a set of $M$ elements that contain exactly $m$ distinct values. Proceed in steps.

Step 1: Choose the $m$ distinct values.

The $m$ values that actually appear among $X[1],\ldots,X[n]$ must be selected from the $M$ available elements. There are

$$ \binom{M}{m} $$

ways to choose this subset of values.

Step 2: Count sequences over the chosen $m$ values.

Once the $m$ values are chosen, each of the $n$ positions must be assigned a value from these $m$ elements such that all $m$ elements appear at least once. Let $N_{n,m}$ denote the number of sequences of length $n$ over $m$ labeled elements in which all $m$ elements appear at least once. By combinatorial reasoning, this is equivalent to counting the number of onto functions from a set of $n$ labeled positions to a set of $m$ labeled values.

By the theory of Stirling numbers of the second kind, the number of ways to partition $n$ labeled objects into $m$ nonempty unlabeled subsets is $\left{{n \atop m}\right}$. Once the partition is fixed, each of the $m$ subsets can be assigned uniquely to the $m$ chosen distinct values. There are $m!$ ways to perform this assignment. Hence,

$$ N_{n,m} = m! \left{{n \atop m}\right}. $$

Step 3: Combine choices.

Multiplying the number of ways to select the distinct values with the number of sequences using all these values yields the total number of sequences of length $n$ over $M$ elements with exactly $m$ distinct values:

\text{# sequences} = \binom{M}{m} , m! \left{{n \atop m}\right}.

Step 4: Divide by total number of sequences.

The total number of sequences of length $n$ over $M$ elements is $M^n$. Therefore, the probability that exactly $m$ distinct values appear among $X[1],\ldots,X[n]$ is

$$ P_{n,M,m} = \frac{\binom{M}{m} , m! \left{{n \atop m}\right}}{M^n}. $$

Step 5: Simplify.

Observe that $\binom{M}{m} m! = \frac{M!}{(M-m)!}$, so we may equivalently write

$$ P_{n,M,m} = \frac{M!}{(M-m)!} \frac{\left{{n \atop m}\right}}{M^n}. $$

This formula expresses the probability exactly in terms of Stirling numbers of the second kind. The derivation covers all cases: each choice of $m$ distinct values is counted, all sequences using all $m$ values are included, and no sequence is counted more than once.

This completes the derivation. ∎

$$ \boxed{P_{n,M,m} = \frac{\binom{M}{m}, m!, \left{{n \atop m}\right}}{M^n} = \frac{M!}{(M-m)!}\frac{\left{{n \atop m}\right}}{M^n}} $$