Project Euler Problem 852

Solution

Answer: 130.313496

Let $V(f,u)$ denote the optimal expected future score when there are $f$ fair coins and $u$ unfair coins left in the box.

A fair coin has

$$P(H)=\tfrac12,$$

while an unfair coin has

$$P(H)=\tfrac34.$$

If the current posterior probability that the selected coin is unfair is $q$, then:

guessing “unfair” gives expected reward

$$20q-50(1-q)=70q-50,$$

guessing “fair” gives expected reward

$$20(1-q)-50q=20-70q.$$

So the optimal immediate stopping payoff is

$$G(q)=\max(70q-50,;20-70q).$$

Bayesian update

Let

$$r=\frac{q}{1-q}$$

be the odds ratio that the coin is unfair.

Initially,

$$r_0=\frac{u}{f}.$$

After observing:

a head: likelihood ratio multiplies by $\frac{3/4}{1/2}=\frac32$,
a tail: likelihood ratio multiplies by $\frac{1/4}{1/2}=\frac12$.

Hence:

$$r \to \frac32 r \quad (\text{head}), \qquad r \to \frac12 r \quad (\text{tail}).$$

The predictive probability of a head is

$$P(H)=\frac34 q+\frac12(1-q) =\frac{2+3r}{4(1+r)}.$$

Dynamic programming for one coin

Suppose after this round the continuation values differ by

$$A = V(f,u-1)-V(f-1,u).$$

Then if we stop immediately at posterior $q$, the total expected value from this round onward is

$$\max\Big(q(70+A)-50,; q(A-70)+20\Big).$$

Let $F(r)$ be the optimal expected value starting from odds ratio $r$. Then

$$F(r)=\max\left( S(r), -1 + P(H)F!\left(\tfrac32 r\right) + P(T)F!\left(\tfrac12 r\right) \right),$$

where $S(r)$ is the stopping value above.

Finally,

$$V(f,u)=V(f-1,u)+F!\left(\frac{u}{f}\right).$$

Boundary conditions:

$$V(0,u)=20u,\qquad V(f,0)=20f,$$

because once all remaining coins are known, every future guess is correct with no tossing required.

Verification for $N=1$

Running the DP gives

$$S(1)=20.558587\ldots$$

which matches the problem statement:

$$20.558591$$

(after rounding to 6 decimals).

Python implementation

from math import *

def solve(N, depth=160):
    V = [[0.0]*(N+1) for _ in range(N+1)]

    pow2 = [1.0]
    pow3 = [1.0]

    for _ in range(depth + 2):
        pow2.append(pow2[-1] * 2.0)
        pow3.append(pow3[-1] * 3.0)

    for total in range(1, 2*N + 1):

        lo = max(0, total - N)
        hi = min(N, total)

        for f in range(lo, hi + 1):

            u = total - f

            if f == 0:
                V[f][u] = 20.0 * u
                continue

            if u == 0:
                V[f][u] = 20.0 * f
                continue

            A = V[f][u-1] - V[f-1][u]

            r0 = u / f

            # H[a][b]:
            # after b tosses with a heads
            H = [[0.0]*(depth+2) for _ in range(depth+2)]

            for b in range(depth, -1, -1):

                for a in range(b, -1, -1):

                    r = r0 * pow3[a] / pow2[b]
                    q = r / (1.0 + r)

                    stop = max(
                        q*(70.0 + A) - 50.0,
                        q*(A - 70.0) + 20.0
                    )

                    if b < depth:

                        pH = (2.0 + 3.0*r) / (4.0*(1.0 + r))

                        cont = (
                            -1.0
                            + pH * H[a+1][b+1]
                            + (1.0-pH) * H[a][b+1]
                        )

                        stop = max(stop, cont)

                    H[a][b] = stop

            V[f][u] = V[f-1][u] + H[0][0]

    return V[N][N]

ans = solve(50)
print(f"{ans:.6f}")

The computation converges rapidly with increasing truncation depth:

depth 120: $130.313478$
depth 140: $130.313495$
depth 160: $130.313496$

so the answer is stable to 6 decimal places.

Answer: 130.313496