Kvant Math Problem 180

A strategy can be represented by a decision tree.

Verified: no
Verdicts: SKIP + SKIP
Solve time: 4m38s
Source on kvant.digital

Problem

Two people play the following game. One thinks of a positive integer $n$, and the other asks questions of the form “is it true that $n \ge x$?” (he may choose $x$ at his discretion) and receives the answers “yes” or “no.” To each possible strategy $T$ of the second player, associate a function $f_T(n)$ equal to the number of questions asked (before the number is guessed) when the chosen number was $n$.

For example, let the strategy $T$ consist of first asking the questions: “is it true that $n \ge 10$?”, “is it true that $n \ge 20$?”, ... until for some question “is it true that $n \ge 10 (k+1)$?” the answer “no” is obtained, and then asking the questions “is it true that $n \ge 10k+1$?”, “is it true that $n \ge 10k+2$?” and so on. Then $f_T(n) = \dfrac{n-a}{10} + a + 2$, where $a$ is the last digit of the number $n$; that is, $f_T(n)$ grows approximately like $\dfrac{n}{10}$.

  1. Propose a strategy for which the function $f_T(n)$ grows as slowly as possible.
  2. When comparing two strategies, it is convenient to introduce, instead of the function $f_T(n)$, the function $f_T'(n) = \underset{1 \leq k \leq n}{\max} f_T(k)$—it shows how many questions are sufficient to guess any number not exceeding $n$. Obtain a lower bound for $f_T'(n)$ for an arbitrary strategy $T$.

Ya. M. Bardzin'

Exploration

A strategy can be represented by a decision tree. Each question has the form “is $n\ge x$?”, so the answer divides the positive integers into two sets, those below $x$ and those at least $x$. After several questions, the set of numbers still compatible with all answers is always an interval of positive integers.

To identify the number, the interval must eventually shrink to a single integer.

Let $g(n)=f'T(n)=\max{1\le k\le n}f_T(k)$. If every number from $1$ to $n$ can be determined in at most $m$ questions, then the decision tree restricted to these numbers has depth at most $m$. A binary tree of depth $m$ has at most $2^m$ leaves. Since there are $n$ possible numbers, one immediately obtains $n\le 2^m$, hence $m\ge \log_2 n$.

This suggests that no strategy can do substantially better than binary search.

Can binary search actually be carried out with questions of the required form? Yes. If the current possible interval is $[a,b]$, ask whether

$$n\ge \left\lfloor\frac{a+b+1}{2}\right\rfloor.$$

A positive answer leaves approximately the upper half of the interval, a negative answer leaves approximately the lower half. Thus the interval length is at least halved at each step.

The crucial point is to prove rigorously that after $m$ questions binary search can always distinguish all numbers up to $n$, and that every strategy requires at least $\lceil\log_2 n\rceil$ questions in the worst case. The lower bound is a counting argument on leaves of a decision tree.

Problem Understanding

We are given a guessing game. One player chooses a positive integer $n$. The other may ask comparison questions of the form

$$n\ge x?$$

for arbitrary integers $x$, receiving only the answers yes or no.

For a strategy $T$, the function $f_T(n)$ counts how many questions are asked before the number $n$ is determined. The function

$$f'T(n)=\max{1\le k\le n}f_T(k)$$

measures the worst-case number of questions needed to determine any number not exceeding $n$.

The problem asks for a strategy with the slowest possible growth of $f_T(n)$ and for a universal lower bound on $f'_T(n)$.

This is a Type C problem. We must determine the optimal order of growth and prove that no strategy can do better.

The natural candidate is binary search. Each question should split the remaining interval into two nearly equal parts, giving worst-case complexity about $\log_2 n$. Since each question has only two possible answers, after $m$ questions there are at most $2^m$ possible answer sequences, so one should not expect to distinguish more than $2^m$ numbers.

Proof Architecture

The first lemma states that every strategy corresponds to a binary decision tree whose leaves represent the numbers eventually identified.

The second lemma states that if every number from $1$ to $n$ is determined in at most $m$ questions, then the corresponding decision tree has at least $n$ leaves and at most $2^m$ leaves, hence $n\le 2^m$.

The third lemma states that binary search on an interval of length $L$ reduces the interval length to at most $\lceil L/2\rceil$ after one question.

The fourth lemma states that after $m$ binary-search questions the remaining interval has length at most $\lceil n/2^m\rceil$.

The hardest direction is the lower bound. The lemma most likely to fail under scrutiny is the passage from question sequences to the bound $2^m$ on the number of distinguishable integers, so the decision-tree argument must be stated carefully.

Solution

For a fixed strategy $T$, consider all possible sequences of answers to the questions prescribed by the strategy. Since every question has exactly two possible answers, the execution of the strategy is described by a binary decision tree.

Each leaf of this tree corresponds to a situation in which the chosen number has been identified. Different numbers must correspond to different leaves, because otherwise the same sequence of answers would be compatible with two distinct numbers and the strategy would not determine the number uniquely.

Let

$$m=f'_T(n).$$

By definition, every integer $k$ with $1\le k\le n$ is identified after at most $m$ questions. Hence the part of the decision tree relevant to the numbers $1,\dots,n$ has depth at most $m$.

A binary tree of depth at most $m$ has at most

$$1+2+\cdots+2^m=2^{m+1}-1$$

vertices and, in particular, at most $2^m$ leaves. Since the numbers $1,\dots,n$ must correspond to distinct leaves, the tree has at least $n$ leaves. Therefore

$$n\le 2^m.$$

Taking binary logarithms gives

$$m\ge \log_2 n.$$

Since $m$ is an integer,

$$f'_T(n)\ge \lceil\log_2 n\rceil .$$

This is the required lower bound for every strategy.

We now construct a strategy attaining this bound up to equality.

Suppose the possible values of the number form an interval $[a,b]$. Initially, for the purpose of determining a number known to satisfy $1\le n\le N$, we have $[a,b]=[1,N]$.

Let

$$c=\left\lfloor\frac{a+b+1}{2}\right\rfloor .$$

Ask whether $n\ge c$.

If the answer is “yes”, then

$$n\in[c,b].$$

If the answer is “no”, then

$$n\in[a,c-1].$$

The lengths of these two intervals are

$$b-c+1,\qquad c-a.$$

Because $c$ is chosen as the midpoint, each of them does not exceed

$$\left\lceil\frac{b-a+1}{2}\right\rceil .$$

Hence one question reduces the number of remaining possibilities by at least a factor of $2$.

Starting from an interval of length $N$, after $m$ such questions the number of remaining possibilities is at most

$$\left\lceil\frac{N}{2^m}\right\rceil .$$

Choose

$$m=\lceil\log_2 N\rceil .$$

Then $2^m\ge N$, so

$$\left\lceil\frac{N}{2^m}\right\rceil =1.$$

After at most $m$ questions only one number remains possible, and the number is determined.

Thus the binary-search strategy satisfies

$$f'_T(N)\le \lceil\log_2 N\rceil .$$

Combined with the universal lower bound,

$$f'_T(N)\ge \lceil\log_2 N\rceil ,$$

we obtain

$$f'_T(N)=\lceil\log_2 N\rceil .$$

Consequently, the optimal strategy is binary search, and the smallest possible worst-case number of questions for numbers not exceeding $N$ is

$$\boxed{\lceil\log_2 N\rceil}.$$

Equality is achieved by repeatedly asking whether the number is at least the midpoint of the current interval.

Verification of Key Steps

The lower bound depends on the claim that a strategy using at most $m$ questions cannot distinguish more than $2^m$ numbers. Re-deriving it directly, every run of the strategy produces a sequence of $m$ or fewer yes-no answers. Extending shorter sequences by arbitrary terminal symbols, one obtains at most $2^m$ distinct answer patterns. Distinct numbers must yield distinct patterns, so at most $2^m$ numbers can be distinguished.

The halving property of binary search must also be checked carefully. Let the current interval have length

$$L=b-a+1.$$

With

$$c=\left\lfloor\frac{a+b+1}{2}\right\rfloor ,$$

the two resulting lengths are $c-a$ and $b-c+1$. For $L=2r$ they are $r$ and $r$, while for $L=2r+1$ they are $r$ and $r+1$. In both cases the larger length equals $\lceil L/2\rceil$.

Finally, the termination count must be verified. If

$$m=\lceil\log_2 N\rceil,$$

then $2^m\ge N$. Hence

$$\frac{N}{2^m}\le 1.$$

Since the remaining interval length is a positive integer not exceeding $1$, it must equal $1$, so the number has been uniquely determined.

Alternative Approaches

The lower bound may also be phrased in information-theoretic language. Each question produces one bit of information, because there are only two possible answers. Distinguishing among $n$ possible numbers requires at least $\log_2 n$ bits, hence at least $\lceil\log_2 n\rceil$ questions. The decision-tree proof is essentially a rigorous combinatorial formulation of this idea.

Another route to the upper bound is induction on $N$. One shows that an interval of size $N$ can be split into two intervals of sizes at most $\lceil N/2\rceil$, and then applies the inductive hypothesis to the appropriate half. This yields the recurrence

$$g(N)=1+g!\left(\left\lceil\frac N2\right\rceil\right),$$

whose solution is $g(N)=\lceil\log_2 N\rceil$. The direct interval-halving argument is shorter and makes the optimal strategy transparent.