TAOCP 1.2.10 Exercise 3

Let $X$ be the number of times step M4 is executed when Algorithm M finds the maximum of $n=1000$ distinct items presented in a uniformly random order.

Section 1.2.10: Analysis of an Algorithm

Exercise 3. [M15] What are the minimum, maximum, average, and standard deviation of the number of times step M4 is executed, if we are using Algorithm M to find the maximum of 1000 randomly ordered, distinct items? (Give your answer as decimal approximations to these quantities.)

Verified: yes
Solve time: 4m35s


Let $X$ be the number of times step M4 is executed when Algorithm M finds the maximum of $n=1000$ distinct items presented in a uniformly random order.

Step M4 is executed exactly when the current item exceeds every item seen previously. Such an item is called a record maximum. Since the first item initializes the current maximum and does not execute M4, $X$ counts the record maxima occurring after position $1$.

Minimum

The smallest possible value of $X$ occurs when the first item is already the largest of all $1000$ items.

In that case no later item can exceed the current maximum, so M4 is never executed.

$$ X_{\min}=0. $$

Maximum

The largest possible value of $X$ occurs when the items appear in increasing order.

Then each of the remaining $999$ items is larger than all previous items, so M4 is executed at every subsequent step.

$$ X_{\max}=999. $$

Average

For $j\ge 2$, define

$$ I_j= \begin{cases} 1,&\text{if the (j)th item is the largest among the first (j) items},\[4pt] 0,&\text{otherwise}. \end{cases} $$

Then

$$ X=\sum_{j=2}^{1000} I_j . $$

Among the first $j$ positions, each position is equally likely to contain the largest of the first $j$ items. Therefore

$$ P(I_j=1)=\frac1j. $$

Hence

$$ E[X] =\sum_{j=2}^{1000} E[I_j] =\sum_{j=2}^{1000}\frac1j =H_{1000}-1, $$

where $H_{1000}$ is the $1000$th harmonic number.

Using

$$ H_{1000}\approx 7.48547086055, $$

we obtain

$$ E[X]\approx 6.48547086055. $$

Standard deviation

The reviewer's objection concerns the independence of the indicators $I_j$. In fact, the indicators are independent.

Let

$$ A_j={I_j=1}. $$

Consider indices

$$ 2\le j_1<j_2<\cdots<j_k. $$

The event $A_{j_t}$ means that, among the first $j_t$ positions, the largest element occurs at position $j_t$.

Condition on the first $j_k$ positions. Among those $j_k$ positions, every relative ordering is equally likely.

For all of the events $A_{j_1},\dots,A_{j_k}$ to occur, the largest of the first $j_k$ elements must be at position $j_k$; the largest among the first $j_{k-1}$ elements must be at position $j_{k-1}$; and so on. Counting relative orderings gives

$$ P(A_{j_1}\cap\cdots\cap A_{j_k}) =\frac1{j_1j_2\cdots j_k}. $$

Since

$$ P(A_{j_t})=\frac1{j_t}, $$

we have

$$ P(A_{j_1}\cap\cdots\cap A_{j_k}) =\prod_{t=1}^{k} P(A_{j_t}), $$

so the events $A_j$, and therefore the indicators $I_j$, are mutually independent.

Consequently,

$$ \operatorname{Var}(X) =\sum_{j=2}^{1000}\operatorname{Var}(I_j). $$

Since $I_j$ is Bernoulli with parameter $1/j$,

$$ \operatorname{Var}(I_j) =\frac1j\left(1-\frac1j\right) =\frac1j-\frac1{j^2}. $$

Thus

$$ \operatorname{Var}(X) =\sum_{j=2}^{1000}\left(\frac1j-\frac1{j^2}\right). $$

Writing

$$ H_{1000}^{(2)} =\sum_{j=1}^{1000}\frac1{j^2}, $$

we get

$$ \operatorname{Var}(X) =(H_{1000}-1)-\bigl(H_{1000}^{(2)}-1\bigr) =H_{1000}-H_{1000}^{(2)}. $$

Using

$$ H_{1000}^{(2)}\approx 1.64393456668, $$

gives

$$ \operatorname{Var}(X) \approx 5.84153629387. $$

Therefore

$$ \sigma =\sqrt{\operatorname{Var}(X)} \approx 2.41692703528. $$

Answers

$$ \boxed{X_{\min}=0} $$

$$ \boxed{X_{\max}=999} $$

$$ \boxed{E[X]\approx 6.48547} $$

$$ \boxed{\sigma\approx 2.41693} $$

The reviewer's critical objection is resolved by supplying the missing proof that the record indicators are in fact mutually independent. Once that fact is established, the variance and standard deviation computed above are correct.