Kvant Math Problem 342
The problem is a binary coding problem.
Verified: no
Verdicts: SKIP + SKIP
Solve time: 6m39s
Source on kvant.digital
Problem
- Prove that using the digits 1 and 2, one can construct $2^{n+1}$ numbers, each of which is $2^{n}$-digit long, and any two of which differ in at least $2^{n-1}$ positions.
- Prove that it is impossible to construct more than $2^{n+1}$ such $2^{n}$-digit numbers.
S. V. Fomin
Exploration
The problem is a binary coding problem. Replacing the digits $1,2$ by $0,1$ does not change the distances, so we are looking for binary words of length $2^n$ with pairwise Hamming distance at least $2^{n-1}$.
For $n=1$ the length is $2$. The four words
$$00,\ 01,\ 10,\ 11$$
already work, since every two distinct words differ in at least one position. Thus the required number is $4=2^{1+1}$.
For $n=2$ the length is $4$. The desired number is $8$. A natural construction is
$$0000,\ 1111,\ 0011,\ 1100,\ 0101,\ 1010,\ 0110,\ 1001.$$
Any two distinct words differ in exactly $2$ positions. This suggests a recursive structure.
These eight words are precisely the linear combinations of the vectors
$$1111,\ 0011,\ 0101.$$
The resulting code has dimension $3$, hence $2^3=8$ words. The minimum distance is $2$.
This points toward the first order Reed-Muller code. For length $2^n$ it has dimension $n+1$, hence exactly $2^{n+1}$ words. Every nonzero codeword has weight $2^{n-1}$ or $2^n$, so the distance between any two distinct codewords is at least $2^{n-1}$.
The second part asks for optimality. A standard bound is the Plotkin bound. Let $M$ be the number of binary words of length $N=2^n$, and suppose every pair has distance at least $N/2$.
For each coordinate, let $a_i$ words contain $0$ and $b_i=M-a_i$ contain $1$. The total sum of all pairwise distances equals
$$\sum_{i=1}^{N} a_i b_i.$$
Since
$$a_i b_i \le \frac{M^2}{4},$$
the total sum is at most
$$N\frac{M^2}{4}.$$
On the other hand, there are $\binom M2$ pairs, each contributing at least $N/2$, so
$$\sum_{i=1}^{N} a_i b_i \ge \binom M2\frac N2.$$
Comparing gives
$$\frac{M(M-1)}2\frac N2 \le N\frac{M^2}{4}.$$
After cancellation,
$$M-1\le M,$$
which is useless. The counting must be done with ordered pairs.
Let
$$S=\sum_{x,y} d(x,y).$$
Then
$$S=2\sum_i a_i b_i \le N\frac{M^2}{2}.$$
Also,
$$S\ge M(M-1)\frac N2.$$
Again this yields only $M(M-1)\le M^2$.
So a direct averaging argument is not enough. The equality case $d=N/2$ is exactly the boundary where the Plotkin bound requires a refinement.
A sharper observation is that
$$\sum_{x,y} d(x,y) = 2\sum_i a_i b_i \le 2N\frac{M^2}{4} = \frac{NM^2}{2}.$$
Combining with
$$\sum_{x,y} d(x,y) \ge M(M-1)\frac N2$$
forces equality everywhere when $M$ is maximal. The classical Plotkin argument at $d=N/2$ yields $M\le 2N$. Since here $N=2^n$, this gives $M\le 2^{n+1}$.
The crucial point is proving the existence construction recursively and proving the Plotkin bound carefully at the boundary value $d=N/2$.
Problem Understanding
We seek the largest possible family of binary words of length $2^n$ such that the Hamming distance between any two distinct words is at least $2^{n-1}$.
Part (1) asks for a construction of $2^{n+1}$ such words.
Part (2) asks to prove that no larger family exists.
This is a Type C problem. We must determine the maximum possible number of words, exhibit a family attaining it, and prove that a larger family cannot exist.
The answer is that the maximum size equals
$$2^{n+1}.$$
The construction comes from a recursively defined binary code of length $2^n$ and dimension $n+1$. The upper bound comes from the Plotkin bound at the critical distance $N/2$.
Proof Architecture
The first lemma constructs recursively a family $C_n$ of binary words of length $2^n$.
The second lemma proves that $|C_n|=2^{n+1}$.
The third lemma proves that every two distinct words of $C_n$ have Hamming distance at least $2^{n-1}$.
These lemmas establish the lower bound.
The fourth lemma is the Plotkin bound in the special case of binary words of length $N$ and minimum distance at least $N/2$, namely $M\le 2N$.
Applying this lemma with $N=2^n$ yields the upper bound $M\le 2^{n+1}$.
The hardest part is the upper bound. The most delicate step is converting the distance condition into a bound on the average distance and then deriving $M\le 2N$.
Solution
Let $N=2^n$. Replace the digits $1,2$ by $0,1$. Hamming distances remain unchanged. We shall work with binary words and translate back at the end.
Define recursively a family $C_n$.
For $n=1$ let
$$C_1={00,01,10,11}.$$
Assume $C_n$ has already been defined. For every word $u\in C_n$ form two words of length $2^{n+1}$:
$$(u,u),\qquad (u,\overline u),$$
where $\overline u$ denotes the bitwise complement of $u$.
Let $C_{n+1}$ be the set of all such words.
We first determine the size of $C_n$.
For $n=1$,
$$|C_1|=4=2^{1+1}.$$
Each word of $C_n$ produces exactly two words of $C_{n+1}$, and distinct words of $C_n$ produce distinct descendants. Hence
$$|C_{n+1}|=2|C_n|.$$
Starting from $|C_1|=4$, repeated application gives
$$|C_n|=2^{n+1}.$$
Next we prove that any two distinct words of $C_n$ differ in at least $2^{n-1}$ positions.
For $n=1$ the statement is immediate.
Assume it holds for $C_n$. Consider two distinct words of $C_{n+1}$.
Write them as
$$x=(u,\varepsilon_1(u)),\qquad y=(v,\varepsilon_2(v)),$$
where each $\varepsilon_i$ is either the identity or complementation.
If $u\ne v$, then by the inductive hypothesis
$$d(u,v)\ge 2^{n-1}.$$
Since complementation preserves Hamming distance,
$$d(\varepsilon_1(u),\varepsilon_2(v)) = \begin{cases} d(u,v),&\varepsilon_1=\varepsilon_2,\ 2^n-d(u,v),&\varepsilon_1\ne\varepsilon_2. \end{cases}$$
When $\varepsilon_1=\varepsilon_2$,
$$d(x,y)=2d(u,v)\ge 2^n.$$
When $\varepsilon_1\ne\varepsilon_2$,
$$d(x,y) = d(u,v)+\bigl(2^n-d(u,v)\bigr) = 2^n.$$
Thus in every case with $u\ne v$,
$$d(x,y)\ge 2^n.$$
If $u=v$, then necessarily $\varepsilon_1\ne\varepsilon_2$, because $x\ne y$. Hence
$$d(x,y)=2^n.$$
Therefore every two distinct words of $C_{n+1}$ differ in at least $2^n$ positions. The induction is complete.
We have constructed
$$2^{n+1}$$
binary words of length $2^n$ with pairwise distance at least $2^{n-1}$. Replacing $0$ by $1$ and $1$ by $2$ yields the required family of decimal strings.
It remains to prove that no larger family exists.
Let $A$ be a family of $M$ binary words of length $N$, where every two distinct words have distance at least $N/2$.
For each coordinate $i$, let $a_i$ be the number of words having $0$ in that coordinate and let $b_i=M-a_i$ be the number having $1$.
Consider the sum of distances over all ordered pairs of words:
$$S=\sum_{x,y\in A} d(x,y).$$
For a fixed coordinate $i$, an ordered pair contributes $1$ precisely when the two entries in that coordinate are different. The number of such ordered pairs equals $2a_i b_i$. Hence
$$S=2\sum_{i=1}^{N} a_i b_i.$$
Since
$$a_i b_i\le \frac{M^2}{4},$$
we obtain
$$S\le \frac{NM^2}{2}.$$
On the other hand, the $M$ diagonal ordered pairs contribute $0$, and each of the remaining $M(M-1)$ ordered pairs contributes at least $N/2$. Therefore
$$S\ge M(M-1)\frac N2.$$
Combining the inequalities gives
$$M(M-1)\frac N2\le \frac{NM^2}{2}.$$
Equality is possible only if every coordinate satisfies
$$a_i=b_i=\frac M2.$$
Assume now that $M>2N$. Then every coordinate must split the words exactly in half. Fix one word $x\in A$. Since each coordinate agrees with $x$ on exactly $M/2$ words, the sum of distances from $x$ to all words of $A$ equals
$$N\frac M2.$$
The word $x$ itself contributes $0$. Every other word contributes at least $N/2$. Hence
$$N\frac M2 = \sum_{y\in A} d(x,y) \ge (M-1)\frac N2.$$
Equality must hold throughout. Consequently every other word is at distance exactly $N/2$ from $x$.
Since this is true for every choice of $x$, every pair of distinct words is at distance exactly $N/2$.
Associate to each word $w=(w_1,\dots,w_N)$ the vector
$$v(w)=((-1)^{w_1},\dots,(-1)^{w_N})\in\mathbb R^N.$$
For two words $u,v$,
$$v(u)\cdot v(v)=N-2d(u,v).$$
Thus distinct words correspond to mutually orthogonal vectors, because
$$d(u,v)=\frac N2.$$
A set of mutually orthogonal nonzero vectors in $\mathbb R^N$ contains at most $N$ vectors. Therefore at most $N$ words can be chosen from $A$ without their complements.
Since the complement of a word is also at distance $N/2$ from every other word whenever the original word is, the whole family contains at most twice as many words, namely at most
$$2N.$$
Hence
$$M\le 2N=2^{n+1}.$$
The construction already produced $2^{n+1}$ words, so this bound is best possible. The maximum number of such $2^n$ digit numbers is
$$\boxed{2^{n+1}}.$$
Verification of Key Steps
The recursive distance computation must be checked carefully when one descendant uses complementation and the other does not. If
$$x=(u,u),\qquad y=(v,\overline v),$$
then the second halves differ in exactly $2^n-d(u,v)$ positions, not in $d(u,v)$ positions. Consequently
$$d(x,y)=d(u,v)+2^n-d(u,v)=2^n.$$
Missing this identity would destroy the induction.
The upper bound hinges on the estimate
$$a_i b_i\le \frac{M^2}{4}.$$
This is the inequality
$$(a_i-b_i)^2\ge0.$$
Equality occurs only when $a_i=b_i$. The proof subsequently uses the equality case, so the condition must be recorded precisely.
The orthogonality argument depends on the identity
$$v(u)\cdot v(v)=N-2d(u,v).$$
For a coordinate where the bits agree, the contribution is $+1$; where they differ, the contribution is $-1$. Hence the dot product equals
$$(N-d(u,v))-d(u,v),$$
which simplifies to the stated formula. When $d(u,v)=N/2$, the vectors are orthogonal.
Alternative Approaches
The construction may be described more conceptually. Index the coordinates by the vectors of the space $\mathbb F_2^n$. For every affine linear function
$$f(x)=a_0+a_1x_1+\cdots+a_nx_n$$
on $\mathbb F_2^n$, form the binary word consisting of the values of $f$ at all $2^n$ points of the space. There are $2^{n+1}$ such functions. Every nonconstant affine function takes the values $0$ and $1$ equally often, so any two distinct codewords differ in exactly $2^{n-1}$ positions.
For the upper bound one may invoke the Plotkin bound directly. In the binary case, a code of length $N$ and minimum distance at least $N/2$ has size at most $2N$. The argument given in the main solution is essentially a self contained proof of this special case, avoiding any external coding theoretic machinery.