Kvant Math Problem 737

Let the houses have populations $h_1,h_2,\ldots,h_n$, arranged in nonincreasing order.

Verified: no
Verdicts: SKIP + SKIP
Solve time: 6m25s
Source on kvant.digital

Problem

Let $d_k$ denote the number of houses in your city that have at least $k$ residents $(d_1\ge d_2\ge d_3\ge \ldots)$, and let $c_m$ denote the number of residents in the $m$-th most populous house $(c_1\ge c_2\ge c_3\ge \ldots)$. Prove the equalities

  1. $$c_1+c_2+c_3+\ldots=d_1+d_2+d_3+\ldots;$$
  2. $$\begin{gather*}c_1^2+c_2^2+c_3^2+\ldots=d_1+3d_2+5d_3+\ldots+(2k-1)d_k+\ldots,\ d_1^2+d_2^2+d_3^2+\ldots=c_1+3c_2+5c_3+\ldots+(2m-1)c_m+\ldots\end{gather*}$$

A. V. Zelevinsky

Exploration

Let the houses have populations $h_1,h_2,\ldots,h_n$, arranged in nonincreasing order. Then $c_m=h_m$. The quantity $d_k$ counts how many houses have population at least $k$.

Take a small example: house populations $(4,2,1)$. Then

$$(c_m)=(4,2,1),$$

and

$$(d_k)=(3,2,1,1).$$

Indeed,

$$4+2+1=3+2+1+1=7.$$

Also,

$$4^2+2^2+1^2=21,$$

while

$$1\cdot3+3\cdot2+5\cdot1+7\cdot1=21.$$

The picture behind these identities is a Ferrers diagram. Represent each house by a row of unit squares whose length equals the number of residents in that house. Then $c_m$ is the length of the $m$-th row, while $d_k$ is the height of the $k$-th column.

The first identity says that the number of squares counted by rows equals the number of squares counted by columns.

For the second identity, observe that

$$r^2=1+3+5+\cdots +(2r-1).$$

Hence each row of length $r$ contributes $1,3,\ldots,2r-1$. If these contributions are regrouped by columns, the coefficient of $d_k$ becomes $2k-1$.

The step most likely to hide an error is the regrouping. One must justify carefully that the odd number $2k-1$ is contributed exactly once by every row whose length is at least $k$, and by no other row.

Problem Understanding

We are given two nonincreasing sequences associated with the same city. The number $c_m$ is the population of the $m$-th most populous house. The number $d_k$ is the number of houses whose population is at least $k$.

The problem asks us to prove three identities relating these two sequences.

This is a Type B problem, a pure proof.

The core difficulty is to recognize that the sequences are conjugate to each other: $d_k$ counts how many terms of the sequence $(c_m)$ are at least $k$. After that, the identities arise from counting the same collection of objects in two different ways.

Proof Architecture

Lemma 1. For every $k$,

$$d_k=#{m:c_m\ge k}.$$

This follows directly from the definitions.

Lemma 2. The total number of residents satisfies

$$\sum_m c_m=\sum_k d_k.$$

Count pairs $(m,t)$ with $1\le t\le c_m$ either by fixing $m$ or by fixing $t$.

Lemma 3. For every positive integer $r$,

$$r^2=\sum_{k=1}^{r}(2k-1).$$

The sum of the first $r$ odd numbers equals $r^2$.

Lemma 4. The identity

$$\sum_m c_m^2=\sum_k (2k-1)d_k$$

holds.

Expand each square using Lemma 3 and interchange the order of summation; Lemma 1 identifies the inner count with $d_k$.

Lemma 5. The identity

$$\sum_k d_k^2=\sum_m (2m-1)c_m$$

holds.

Apply Lemma 4 to the conjugate pair obtained by exchanging the roles of rows and columns.

The most delicate lemma is Lemma 4, because the change in the order of counting must be justified precisely.

Solution

For every house and every resident living in that house, consider the pair consisting of the house and the resident's position inside that house. If the houses are ordered by decreasing population, then the $m$-th house contains exactly $c_m$ residents.

For a fixed positive integer $k$, the number $d_k$ is, by definition, the number of houses containing at least $k$ residents. Hence

$$d_k=#{m:c_m\ge k}.$$

This relation will be used repeatedly.

To prove the first equality, count all pairs

$$(m,t),\qquad 1\le t\le c_m.$$

For each fixed $m$ there are $c_m$ such pairs, so the total number of pairs equals

$$\sum_m c_m.$$

For each fixed $t$, the number of pairs with second coordinate $t$ equals the number of indices $m$ satisfying $c_m\ge t$, which is $d_t$. Hence the same set of pairs has cardinality

$$\sum_t d_t.$$

Therefore

$$c_1+c_2+c_3+\cdots=d_1+d_2+d_3+\cdots.$$

Next, for every positive integer $r$,

$$1+3+\cdots +(2r-1)=r^2.$$

Indeed,

$$\sum_{k=1}^{r}(2k-1) =2\sum_{k=1}^{r}k-r =r(r+1)-r =r^2.$$

Applying this identity to each $c_m$ gives

$$c_m^2=\sum_{k=1}^{c_m}(2k-1).$$

Summing over all $m$,

$$\sum_m c_m^2 =\sum_m\sum_{k=1}^{c_m}(2k-1).$$

Regroup the terms according to the value of $k$:

$$\sum_m c_m^2 =\sum_{k\ge1}(2k-1),#{m:c_m\ge k}.$$

Using

$$#{m:c_m\ge k}=d_k,$$

we obtain

$$\sum_m c_m^2 =\sum_{k\ge1}(2k-1)d_k.$$

This proves

$$c_1^2+c_2^2+c_3^2+\cdots =d_1+3d_2+5d_3+\cdots +(2k-1)d_k+\cdots .$$

To prove the final identity, interchange the roles of the sequences $(c_m)$ and $(d_k)$. The relation

$$d_k=#{m:c_m\ge k}$$

means precisely that $(d_k)$ is the conjugate sequence of $(c_m)$. Applying the identity just proved to the conjugate pair gives

$$\sum_k d_k^2 =\sum_m(2m-1)c_m.$$

Thus

$$d_1^2+d_2^2+d_3^2+\cdots =c_1+3c_2+5c_3+\cdots +(2m-1)c_m+\cdots .$$

This completes the proof.

Verification of Key Steps

The first delicate step is the equality

$$d_k=#{m:c_m\ge k}.$$

By definition, $d_k$ counts houses with at least $k$ residents. The $m$-th house has exactly $c_m$ residents. Hence a house contributes to $d_k$ exactly when its population satisfies $c_m\ge k$. No ordering assumption beyond the definition of $c_m$ is needed.

The second delicate step is the regrouping

$$\sum_m\sum_{k=1}^{c_m}(2k-1) = \sum_{k\ge1}(2k-1)#{m:c_m\ge k}.$$

Fix $k$. The term $2k-1$ appears once in the expansion of $c_m^2$ if and only if $k\le c_m$. Consequently the coefficient of $2k-1$ after all expansions are added equals the number of indices $m$ with $c_m\ge k$, namely $d_k$.

The third delicate step is the symmetry used for the last identity. The relation

$$d_k=#{m:c_m\ge k}$$

means that if one draws the Ferrers diagram whose row lengths are $c_m$, then $d_k$ is the length of the $k$-th column. Passing from rows to columns is an involution. Hence the formula already proved remains valid after exchanging the symbols $c$ and $d$.

Alternative Approaches

A geometric proof uses the Ferrers diagram directly. Draw $c_m$ unit squares in the $m$-th row. The number of squares in the $k$-th column is $d_k$. Counting squares by rows and by columns immediately yields

$$\sum_m c_m=\sum_k d_k.$$

For the quadratic identities, assign weight $2k-1$ to every square in the $k$-th column. The total weight of a row of length $c_m$ equals

$$1+3+\cdots +(2c_m-1)=c_m^2,$$

so the total weight is $\sum_m c_m^2$. Counting instead by columns gives $\sum_k(2k-1)d_k$. The second quadratic identity follows by the same argument after transposing the diagram.

The main proof is preferable because it converts the geometric picture into explicit double-counting formulas and makes every counting step algebraically transparent.