TAOCP 1.2.10 Exercise 10
We are asked to determine the probability distribution of $A$, the number of times step M4 is executed, when each $X[k]$ is selected independently and uniformly from a set of $M$ objects, using the re...
Section 1.2.10: Analysis of an Algorithm
Exercise 10. [M20] Combine the results of the preceding three exercises to obtain a formula for the probability that $A=k$ under the assumption that each $X$ is selected at random from a set of $M$ objects.
Verified: yes
Solve time: 4m16s
Solution
We are asked to determine the probability distribution of $A$, the number of times step M4 is executed, when each $X[k]$ is selected independently and uniformly from a set of $M$ objects, using the results of the preceding exercises.
Let $n$ be the length of the input sequence $X[1],X[2],\dots,X[n]$, and let $A$ denote the number of times the current maximum is updated during Algorithm M. Denote by $p_{nk}^{(M)}$ the probability that $A=k$ under the assumption that each $X[k]$ is drawn uniformly from a set of $M$ distinct objects.
Step 1: Conditioning on the number of distinct values
From Exercise 9, if exactly $m$ distinct values occur among $X[1],\dots,X[n]$, the probability that $A=k$ depends only on $m$; call this conditional probability $p_{nk}^{(m)}$, where $p_{nk}^{(m)}$ is the probability that $A=k$ for a sequence of length $n$ containing $m$ distinct values. By Exercise 7, $p_{nk}^{(m)}$ can be expressed in terms of Stirling numbers of the first kind:
$$ p_{nk}^{(m)} = \frac{\left[ {m \atop k+1} \right]}{m!}. $$
This follows because the analysis of step M4 depends only on the relative order of the $m$ distinct values. If we label the distinct values $1,2,\dots,m$ and consider the sequence of these labels in order of appearance, the number of times a new maximum occurs is exactly $k$ if and only if the corresponding permutation of $m$ elements has $k$ rises, which is counted by the Stirling numbers $\left[ {m \atop k+1} \right]$.
Step 2: Probability of exactly $m$ distinct values
From Exercise 9, the probability that exactly $m$ distinct values appear among $X[1],\dots,X[n]$, drawn from a set of $M$ elements, is
$$ \Pr(\text{exactly } m \text{ distinct values}) = \frac{M!}{(M-m)!}\frac{\left{ {n \atop m} \right}}{M^n}, \tag{1} $$
where $\left{ {n \atop m} \right}$ denotes a Stirling number of the second kind, counting the number of ways to partition $n$ labeled objects into $m$ nonempty unlabeled subsets. The factor $M!/(M-m)!$ accounts for the choice of which $m$ elements of the $M$ are actually present.
Step 3: Total probability formula
By the law of total probability, conditioning on the number $m$ of distinct values, we have
$$ p_{nk}^{(M)} = \sum_{m=1}^{\min(n,M)} \Pr(\text{exactly } m \text{ distinct values}) \cdot p_{nk}^{(m)}. \tag{2} $$
Substituting the expressions from Steps 1 and 2 gives
$$ p_{nk}^{(M)} = \sum_{m=1}^{\min(n,M)} \frac{M!}{(M-m)!} \frac{\left{ {n \atop m} \right}}{M^n} \cdot \frac{\left[ {m \atop k+1} \right]}{m!}. \tag{3} $$
Simplifying the factorial coefficients:
$$ \frac{M!}{(M-m)!}\cdot \frac{1}{m!} = \binom{M}{m}, $$
so that
$$ p_{nk}^{(M)} = \sum_{m=1}^{\min(n,M)} \frac{\binom{M}{m}\left{ {n \atop m} \right}\left[ {m \atop k+1} \right]}{M^n}. \tag{4} $$
Equation (4) gives the complete formula for the probability that step M4 is executed exactly $k$ times when each $X[k]$ is chosen independently and uniformly from a set of $M$ objects.
Step 4: Range of $k$
For each $m$, the possible values of $k$ satisfy $0 \le k \le m-1$, since in a sequence of $m$ distinct numbers the maximum can be updated at most $m-1$ times. Therefore, $p_{nk}^{(M)}=0$ whenever $k \ge M$ or $k \ge n$.
Step 5: Summary
The probability distribution of $A$ is
$$ \boxed{ p_{nk}^{(M)} = \sum_{m=1}^{\min(n,M)} \frac{\binom{M}{m}\left{ {n \atop m} \right}\left[ {m \atop k+1} \right]}{M^n}, \qquad 0 \le k \le n-1. } $$
This formula combines the probabilities of obtaining exactly $m$ distinct values with the conditional probabilities for $A$ given $m$ distinct values. It reduces to $p_{nk} = \left[{n \atop k+1}\right]/n!$ when $M=n$, consistent with the original analysis of Algorithm M under the assumption of all distinct $X[k]$.
This completes the solution.
∎