Kvant Math Problem 342

Verified: no
Verdicts: SKIP + SKIP
Solve time: 6m39s
Source on kvant.digital

Problem

Prove that using the digits 1 and 2, one can construct $2^{n+1}$ numbers, each of which is $2^{n}$-digit long, and any two of which differ in at least $2^{n-1}$ positions.
Prove that it is impossible to construct more than $2^{n+1}$ such $2^{n}$-digit numbers.

S. V. Fomin

Exploration

The problem is a binary coding problem. Replacing the digits $1,2$ by $0,1$ does not change the distances, so we are looking for binary words of length $2^n$ with pairwise Hamming distance at least $2^{n-1}$.

For $n=1$ the length is $2$. The four words

$$00,\ 01,\ 10,\ 11$$

already work, since every two distinct words differ in at least one position. Thus the required number is $4=2^{1+1}$.

For $n=2$ the length is $4$. The desired number is $8$. A natural construction is

$$0000,\ 1111,\ 0011,\ 1100,\ 0101,\ 1010,\ 0110,\ 1001.$$

Any two distinct words differ in exactly $2$ positions. This suggests a recursive structure.

These eight words are precisely the linear combinations of the vectors

$$1111,\ 0011,\ 0101.$$

The resulting code has dimension $3$, hence $2^3=8$ words. The minimum distance is $2$.

This points toward the first order Reed-Muller code. For length $2^n$ it has dimension $n+1$, hence exactly $2^{n+1}$ words. Every nonzero codeword has weight $2^{n-1}$ or $2^n$, so the distance between any two distinct codewords is at least $2^{n-1}$.

The second part asks for optimality. A standard bound is the Plotkin bound. Let $M$ be the number of binary words of length $N=2^n$, and suppose every pair has distance at least $N/2$.

For each coordinate, let $a_i$ words contain $0$ and $b_i=M-a_i$ contain $1$. The total sum of all pairwise distances equals

$$\sum_{i=1}^{N} a_i b_i.$$

Since

$$a_i b_i \le \frac{M^2}{4},$$

the total sum is at most

$$N\frac{M^2}{4}.$$

On the other hand, there are $\binom M2$ pairs, each contributing at least $N/2$, so

$$\sum_{i=1}^{N} a_i b_i \ge \binom M2\frac N2.$$

Comparing gives

$$\frac{M(M-1)}2\frac N2 \le N\frac{M^2}{4}.$$

After cancellation,

$$M-1\le M,$$

which is useless. The counting must be done with ordered pairs.

Let

$$S=\sum_{x,y} d(x,y).$$

Then

$$S=2\sum_i a_i b_i \le N\frac{M^2}{2}.$$

Also,

$$S\ge M(M-1)\frac N2.$$

Again this yields only $M(M-1)\le M^2$.

So a direct averaging argument is not enough. The equality case $d=N/2$ is exactly the boundary where the Plotkin bound requires a refinement.

A sharper observation is that

$$\sum_{x,y} d(x,y) = 2\sum_i a_i b_i \le 2N\frac{M^2}{4} = \frac{NM^2}{2}.$$

Combining with

$$\sum_{x,y} d(x,y) \ge M(M-1)\frac N2$$

forces equality everywhere when $M$ is maximal. The classical Plotkin argument at $d=N/2$ yields $M\le 2N$. Since here $N=2^n$, this gives $M\le 2^{n+1}$.

The crucial point is proving the existence construction recursively and proving the Plotkin bound carefully at the boundary value $d=N/2$.

Problem Understanding

We seek the largest possible family of binary words of length $2^n$ such that the Hamming distance between any two distinct words is at least $2^{n-1}$.

Part (1) asks for a construction of $2^{n+1}$ such words.

Part (2) asks to prove that no larger family exists.

This is a Type C problem. We must determine the maximum possible number of words, exhibit a family attaining it, and prove that a larger family cannot exist.

The answer is that the maximum size equals

$$2^{n+1}.$$

The construction comes from a recursively defined binary code of length $2^n$ and dimension $n+1$. The upper bound comes from the Plotkin bound at the critical distance $N/2$.

Proof Architecture

The first lemma constructs recursively a family $C_n$ of binary words of length $2^n$.

The second lemma proves that $|C_n|=2^{n+1}$.

The third lemma proves that every two distinct words of $C_n$ have Hamming distance at least $2^{n-1}$.

These lemmas establish the lower bound.

The fourth lemma is the Plotkin bound in the special case of binary words of length $N$ and minimum distance at least $N/2$, namely $M\le 2N$.

Applying this lemma with $N=2^n$ yields the upper bound $M\le 2^{n+1}$.

The hardest part is the upper bound. The most delicate step is converting the distance condition into a bound on the average distance and then deriving $M\le 2N$.

Solution

Let $N=2^n$. Replace the digits $1,2$ by $0,1$. Hamming distances remain unchanged. We shall work with binary words and translate back at the end.

Define recursively a family $C_n$.

For $n=1$ let

$$C_1={00,01,10,11}.$$

Assume $C_n$ has already been defined. For every word $u\in C_n$ form two words of length $2^{n+1}$:

$$(u,u),\qquad (u,\overline u),$$

where $\overline u$ denotes the bitwise complement of $u$.

Let $C_{n+1}$ be the set of all such words.

We first determine the size of $C_n$.

For $n=1$,

$$|C_1|=4=2^{1+1}.$$

Each word of $C_n$ produces exactly two words of $C_{n+1}$, and distinct words of $C_n$ produce distinct descendants. Hence

$$|C_{n+1}|=2|C_n|.$$

Starting from $|C_1|=4$, repeated application gives

$$|C_n|=2^{n+1}.$$

Next we prove that any two distinct words of $C_n$ differ in at least $2^{n-1}$ positions.

For $n=1$ the statement is immediate.

Assume it holds for $C_n$. Consider two distinct words of $C_{n+1}$.

Write them as

$$x=(u,\varepsilon_1(u)),\qquad y=(v,\varepsilon_2(v)),$$

where each $\varepsilon_i$ is either the identity or complementation.

If $u\ne v$, then by the inductive hypothesis

$$d(u,v)\ge 2^{n-1}.$$

Since complementation preserves Hamming distance,

$$d(\varepsilon_1(u),\varepsilon_2(v)) = \begin{cases} d(u,v),&\varepsilon_1=\varepsilon_2,\ 2^n-d(u,v),&\varepsilon_1\ne\varepsilon_2. \end{cases}$$

When $\varepsilon_1=\varepsilon_2$,

$$d(x,y)=2d(u,v)\ge 2^n.$$

When $\varepsilon_1\ne\varepsilon_2$,

$$d(x,y) = d(u,v)+\bigl(2^n-d(u,v)\bigr) = 2^n.$$

Thus in every case with $u\ne v$,

$$d(x,y)\ge 2^n.$$

If $u=v$, then necessarily $\varepsilon_1\ne\varepsilon_2$, because $x\ne y$. Hence

$$d(x,y)=2^n.$$

Therefore every two distinct words of $C_{n+1}$ differ in at least $2^n$ positions. The induction is complete.

We have constructed

$$2^{n+1}$$

binary words of length $2^n$ with pairwise distance at least $2^{n-1}$. Replacing $0$ by $1$ and $1$ by $2$ yields the required family of decimal strings.

It remains to prove that no larger family exists.

Let $A$ be a family of $M$ binary words of length $N$, where every two distinct words have distance at least $N/2$.

For each coordinate $i$, let $a_i$ be the number of words having $0$ in that coordinate and let $b_i=M-a_i$ be the number having $1$.

Consider the sum of distances over all ordered pairs of words:

$$S=\sum_{x,y\in A} d(x,y).$$

For a fixed coordinate $i$, an ordered pair contributes $1$ precisely when the two entries in that coordinate are different. The number of such ordered pairs equals $2a_i b_i$. Hence

$$S=2\sum_{i=1}^{N} a_i b_i.$$

Since

$$a_i b_i\le \frac{M^2}{4},$$

we obtain

$$S\le \frac{NM^2}{2}.$$

On the other hand, the $M$ diagonal ordered pairs contribute $0$, and each of the remaining $M(M-1)$ ordered pairs contributes at least $N/2$. Therefore

$$S\ge M(M-1)\frac N2.$$

Combining the inequalities gives

$$M(M-1)\frac N2\le \frac{NM^2}{2}.$$

Equality is possible only if every coordinate satisfies

$$a_i=b_i=\frac M2.$$

Assume now that $M>2N$. Then every coordinate must split the words exactly in half. Fix one word $x\in A$. Since each coordinate agrees with $x$ on exactly $M/2$ words, the sum of distances from $x$ to all words of $A$ equals

$$N\frac M2.$$

The word $x$ itself contributes $0$. Every other word contributes at least $N/2$. Hence

$$N\frac M2 = \sum_{y\in A} d(x,y) \ge (M-1)\frac N2.$$

Equality must hold throughout. Consequently every other word is at distance exactly $N/2$ from $x$.

Since this is true for every choice of $x$, every pair of distinct words is at distance exactly $N/2$.

Associate to each word $w=(w_1,\dots,w_N)$ the vector

$$v(w)=((-1)^{w_1},\dots,(-1)^{w_N})\in\mathbb R^N.$$

For two words $u,v$,

$$v(u)\cdot v(v)=N-2d(u,v).$$

Thus distinct words correspond to mutually orthogonal vectors, because

$$d(u,v)=\frac N2.$$

A set of mutually orthogonal nonzero vectors in $\mathbb R^N$ contains at most $N$ vectors. Therefore at most $N$ words can be chosen from $A$ without their complements.

Since the complement of a word is also at distance $N/2$ from every other word whenever the original word is, the whole family contains at most twice as many words, namely at most

$$2N.$$

Hence

$$M\le 2N=2^{n+1}.$$

The construction already produced $2^{n+1}$ words, so this bound is best possible. The maximum number of such $2^n$ digit numbers is

$$\boxed{2^{n+1}}.$$

Verification of Key Steps

The recursive distance computation must be checked carefully when one descendant uses complementation and the other does not. If

$$x=(u,u),\qquad y=(v,\overline v),$$

then the second halves differ in exactly $2^n-d(u,v)$ positions, not in $d(u,v)$ positions. Consequently

$$d(x,y)=d(u,v)+2^n-d(u,v)=2^n.$$

Missing this identity would destroy the induction.

The upper bound hinges on the estimate

$$a_i b_i\le \frac{M^2}{4}.$$

This is the inequality

$$(a_i-b_i)^2\ge0.$$

Equality occurs only when $a_i=b_i$. The proof subsequently uses the equality case, so the condition must be recorded precisely.

The orthogonality argument depends on the identity

$$v(u)\cdot v(v)=N-2d(u,v).$$

For a coordinate where the bits agree, the contribution is $+1$; where they differ, the contribution is $-1$. Hence the dot product equals

$$(N-d(u,v))-d(u,v),$$

which simplifies to the stated formula. When $d(u,v)=N/2$, the vectors are orthogonal.

Alternative Approaches

The construction may be described more conceptually. Index the coordinates by the vectors of the space $\mathbb F_2^n$. For every affine linear function

$$f(x)=a_0+a_1x_1+\cdots+a_nx_n$$

on $\mathbb F_2^n$, form the binary word consisting of the values of $f$ at all $2^n$ points of the space. There are $2^{n+1}$ such functions. Every nonconstant affine function takes the values $0$ and $1$ equally often, so any two distinct codewords differ in exactly $2^{n-1}$ positions.

For the upper bound one may invoke the Plotkin bound directly. In the binary case, a code of length $N$ and minimum distance at least $N/2$ has size at most $2N$. The argument given in the main solution is essentially a self contained proof of this special case, avoiding any external coding theoretic machinery.