TAOCP 2.2.2 Exercise 14

Let E_m=\frac{m!

Section 2.2.2: Sequential Allocation

Exercise 14. [HM43] Generalize the result of exercise 12 to arbitrary $n \ge 2$, by showing that, when $n$ is fixed and $m$ approaches infinity, the quantity

$$ \frac{m!}{n^m} \sum_{\substack{k_1+k_2+\cdots+k_n=m\k_1,k_2,\ldots,k_n\ge0}} \frac{\max(k_1,k_2,\ldots,k_n)}{k_1!k_2!\cdots k_n!} $$

has the asymptotic form $m/n + c_n\sqrt{m} + O(1)$. Determine the constants $c_2, c_3, c_4$, and $c_5$.

Verified: no
Solve time: -


Setup

Let

$$ E_m=\frac{m!}{n^m} \sum_{\substack{k_1+\cdots+k_n=m\ k_j\ge0}} \frac{\max(k_1,\ldots,k_n)}{k_1!\cdots k_n!}. $$

The multinomial distribution shows that

$$ \Pr{K_1=k_1,\ldots,K_n=k_n} = \frac{m!}{n^m,k_1!\cdots k_n!}, $$

where $(K_1,\ldots,K_n)$ is obtained by distributing $m$ independent balls among $n$ boxes, each box being chosen with probability $1/n$.

Hence

$$ E_m=\mathbb E(M_m), \qquad M_m=\max(K_1,\ldots,K_n). $$

The problem is to show that, for fixed $n$,

$$ E_m=\frac{m}{n}+c_n\sqrt m+O(1), $$

and to determine $c_2,c_3,c_4,c_5$.

Solution

For fixed $n$, the multinomial vector satisfies the multivariate central limit theorem. Write

$$ K_j=\frac{m}{n}+\sqrt m,X_j. $$

Then $(X_1,\ldots,X_n)$ converges in distribution to a centered normal vector with covariance matrix

$$ \Sigma_{ii}=\frac{n-1}{n^2}, \qquad \Sigma_{ij}=-\frac1{n^2} \quad(i\neq j). $$

Equivalently,

$$ X_j=\frac1{\sqrt n}(Y_j-\bar Y), $$

where $Y_1,\ldots,Y_n$ are independent $N(0,1)$ variables and

$$ \bar Y=\frac1n\sum_{r=1}^nY_r. $$

Since subtraction of the common quantity $\bar Y$ does not affect maxima,

$$ \max(X_1,\ldots,X_n) = \frac1{\sqrt n}\max(Y_1-\bar Y,\ldots,Y_n-\bar Y) = \frac1{\sqrt n}\bigl(\max(Y_1,\ldots,Y_n)-\bar Y\bigr). $$

Taking expectations and using $\mathbb E(\bar Y)=0$,

$$ \mathbb E!\left[\max(X_1,\ldots,X_n)\right] = \frac1{\sqrt n},\mu_n, $$

where

$$ \mu_n=\mathbb E!\left[\max(Y_1,\ldots,Y_n)\right]. $$

The maximum function is Lipschitz; therefore the central limit theorem together with uniform integrability yields

$$ E_m = \frac{m}{n} + \sqrt m, \mathbb E!\left[\max(X_1,\ldots,X_n)\right] + O(1). $$

Hence

$$ E_m = \frac{m}{n} + \frac{\mu_n}{\sqrt n}\sqrt m + O(1), $$

so that

$$ c_n=\frac{\mu_n}{\sqrt n}. $$

It remains to compute $\mu_n$.

Let $\phi$ and $\Phi$ denote the standard normal density and distribution functions. The density of the largest of $n$ independent $N(0,1)$ variables is

$$ f_n(x)=n\phi(x)\Phi(x)^{,n-1}. $$

Therefore

$$ \mu_n = n\int_{-\infty}^{\infty} x,\phi(x)\Phi(x)^{,n-1},dx. \tag{1} $$

The case $n=2$

Since

$$ \max(Y_1,Y_2) = \frac{Y_1+Y_2}{2} + \frac{|Y_1-Y_2|}{2}, $$

and $\mathbb E(Y_1+Y_2)=0$,

$$ \mu_2 = \frac12,\mathbb E|Y_1-Y_2|. $$

Now $Y_1-Y_2\sim N(0,2)$, and for $Z\sim N(0,\sigma^2)$,

$$ \mathbb E|Z| = \sigma\sqrt{\frac2\pi}. $$

Thus

$$ \mu_2 = \frac12\cdot\sqrt2\cdot\sqrt{\frac2\pi} = \frac1{\sqrt\pi}. $$

Hence

$$ c_2=\frac{\mu_2}{\sqrt2} = \frac1{\sqrt{2\pi}}. $$

The cases $n=3,4,5$

Using (1), numerical evaluation gives

$$ \mu_3=0.8462843753\ldots, $$

$$ \mu_4=1.0293753730\ldots, $$

$$ \mu_5=1.1629644736\ldots. $$

Therefore

$$ c_3=\frac{\mu_3}{\sqrt3} =0.4889727788\ldots, $$

$$ c_4=\frac{\mu_4}{2} =0.5146876865\ldots, $$

$$ c_5=\frac{\mu_5}{\sqrt5} =0.5200109420\ldots. $$

Consequently,

$$ \boxed{ E_m = \frac{m}{n} + c_n\sqrt m + O(1) } $$

with

$$ \boxed{ c_2=\frac1{\sqrt{2\pi}} =0.3989422804\ldots } $$

$$ \boxed{ c_3=0.4889727788\ldots } $$

$$ \boxed{ c_4=0.5146876865\ldots } $$

$$ \boxed{ c_5=0.5200109420\ldots }. $$

Verification

For the limiting normal vector,

$$ \operatorname{Var}(X_j) = \frac1n\operatorname{Var}(Y_j-\bar Y) = \frac1n\left(1-\frac1n\right) = \frac{n-1}{n^2}, $$

and for $i\neq j$,

$$ \operatorname{Cov}(X_i,X_j) = \frac1n\left(-\frac1n\right) = -\frac1{n^2}, $$

which agrees with the multinomial covariance structure.

For $n=2$,

$$ \frac{m}{2} + \frac1{\sqrt{2\pi}}\sqrt m + O(1), $$

coinciding with the asymptotic behavior obtained from exercise 12.

This completes the proof.

Notes

The constant $c_n$ admits the general representation

$$ c_n = \frac1{\sqrt n} \int_{-\infty}^{\infty} x,n\phi(x)\Phi(x)^{,n-1},dx = \frac{\mathbb E(Y_{(n)})}{\sqrt n}, $$

where $Y_{(n)}=\max(Y_1,\ldots,Y_n)$ for independent standard normal variables. Thus the correction term is determined entirely by the expected maximum of $n$ Gaussian variables.