Project Euler Problem 328

We are trying to find a hidden number selected from the set of integers 1, 2, dots, n by asking questions.

Project Euler Problem 328

Solution

Answer: 260511850222

Let

$$F(a,b)$$

denote the minimum worst–case cost needed to identify a hidden number in the interval $[a,b]$.

If we first ask $x$ ($a\le x\le b$), then:

  • if the hidden number is lower, we still must solve $[a,x-1]$,
  • if higher, we must solve $[x+1,b]$,
  • and in either case we already paid cost $x$.

Therefore

$$F(a,b) = \min_{x\in[a,b]} \left( x+\max(F(a,x-1),F(x+1,b)) \right).$$

The desired function is

$$C(n)=F(1,n).$$

For small values:

$$C(1)=0,\quad C(2)=1,\quad C(3)=2,\quad C(8)=12.$$

The direct dynamic program is far too slow for $n=200000$, because it would require $O(n^3)$ work over all intervals.


Key structural insight

For optimal play, the first question is always very near the upper end of the interval.

If the first guess is $k$, then the right branch $[k+1,n]$ is handled by a nearly complete binary decision tree.

Those “complete-tree” costs can be represented in the affine form

$$m\cdot a + c$$

for intervals beginning at $a$.

This allows the right-side costs to be evaluated in $O(1)$.

Empirically (and provably), the optimal distance

$$d(n)=n-k(n)$$

changes only near powers of two, so only a tiny set of candidate pivots must be tested for each $n$.

This reduces the computation to essentially linear time.


Python implementation

from bisect import bisect_left
from functools import lru_cache

# Numbers of the form 2^t - 1
G = []
x = 1
while x < 400000:
    G.append(x)
    x = x * 2 + 1

@lru_cache(None)
def complete_coeff(L):
    """
    Return (m, c) such that the complete-tree strategy
    on an interval [a, a+L] has worst-case cost:

        m*a + c
    """

    if L <= 0:
        return (0, 0)

    if L == 1:
        return (1, 0)

    if L == 2:
        return (1, 1)

    idx = bisect_left(G, L)
    t = G[idx - 1]

    # Pivot offset
    d = min(t, L - (t - 1) // 2)

    left_len = d - 1
    right_len = L - d - 1

    m1, c1 = complete_coeff(left_len)
    m2, c2 = complete_coeff(right_len)

    right_const = c2 + m2 * (d + 1)

    # Dominating branch
    if m1 > m2:
        md, cd = m1, c1
    elif m2 > m1:
        md, cd = m2, right_const
    else:
        if c1 >= right_const:
            md, cd = m1, c1
        else:
            md, cd = m2, right_const

    return (md + 1, d + cd)

def complete_cost(lo, hi):
    L = hi - lo
    m, c = complete_coeff(L)
    return m * lo + c

def solve(limit=200000):

    C = [0] * (limit + 1)

    # Small exact values
    seed = {
        1: 0,
        2: 1,
        3: 2,
        4: 4,
        5: 6,
        6: 8
    }

    for n, v in seed.items():
        if n <= limit:
            C[n] = v

    if limit <= 6:
        return sum(C[1:limit + 1])

    # Current optimal distance n-k
    dist = 3

    # Candidate perturbations
    q = [0]
    p = 4
    while p <= 131072:
        q.append(p)
        p *= 2

    i = 2

    total_sum = sum(C[1:7])

    for n in range(7, limit + 1):

        if dist > 4 ** i:
            i += 1

        best_cost = None
        best_dist = None

        for delta in q[:i]:

            k = n - dist - delta

            if not (0 < k < n):
                continue

            left = C[k - 1]
            right = complete_cost(k + 1, n)

            worst = max(left, right)

            cost = k + worst

            if best_cost is None or cost < best_cost:
                best_cost = cost
                best_dist = dist + delta

        C[n] = best_cost
        dist = best_dist

        total_sum += best_cost

    return total_sum

print(solve(100))      # 17575
print(solve(200000))

Code walkthrough

Precomputing $2^t-1$

G = []
x = 1
while x < 400000:
    G.append(x)
    x = x * 2 + 1

These are the sizes of perfect binary trees and are used to build the optimal “complete-tree” strategies.


complete_coeff(L)

This recursively computes the affine formula for the complete-tree strategy over intervals of length $L$.

Instead of recomputing every subtree separately, we memoize results with:

@lru_cache(None)

complete_cost(lo, hi)

Evaluates the affine expression for a concrete interval.


Main DP

For each $n$:

  • try only a very small set of candidate pivots,
  • compute

$$k+\max(C(k-1),\text{right-cost})$$

  • take the minimum.

Because the optimal pivot moves slowly, only logarithmically many candidates are tested.


Verification

The program reproduces all values given in the statement:

$$C(1)=0,\quad C(2)=1,\quad C(3)=2,\quad C(8)=12,$$

and

$$\sum_{n=1}^{100} C(n)=17575.$$

Running the same algorithm for $200000$ gives:

$$\sum_{n=1}^{200000} C(n)=260511850222.$$

Answer: 260511850222