Project Euler Problem 852
This game has a box of N unfair coins and N fair coins.
Solution
Answer: 130.313496
Let $V(f,u)$ denote the optimal expected future score when there are $f$ fair coins and $u$ unfair coins left in the box.
A fair coin has
$$P(H)=\tfrac12,$$
while an unfair coin has
$$P(H)=\tfrac34.$$
If the current posterior probability that the selected coin is unfair is $q$, then:
- guessing “unfair” gives expected reward
$$20q-50(1-q)=70q-50,$$
- guessing “fair” gives expected reward
$$20(1-q)-50q=20-70q.$$
So the optimal immediate stopping payoff is
$$G(q)=\max(70q-50,;20-70q).$$
Bayesian update
Let
$$r=\frac{q}{1-q}$$
be the odds ratio that the coin is unfair.
Initially,
$$r_0=\frac{u}{f}.$$
After observing:
- a head: likelihood ratio multiplies by $\frac{3/4}{1/2}=\frac32$,
- a tail: likelihood ratio multiplies by $\frac{1/4}{1/2}=\frac12$.
Hence:
$$r \to \frac32 r \quad (\text{head}), \qquad r \to \frac12 r \quad (\text{tail}).$$
The predictive probability of a head is
$$P(H)=\frac34 q+\frac12(1-q) =\frac{2+3r}{4(1+r)}.$$
Dynamic programming for one coin
Suppose after this round the continuation values differ by
$$A = V(f,u-1)-V(f-1,u).$$
Then if we stop immediately at posterior $q$, the total expected value from this round onward is
$$\max\Big(q(70+A)-50,; q(A-70)+20\Big).$$
Let $F(r)$ be the optimal expected value starting from odds ratio $r$. Then
$$F(r)=\max\left( S(r), -1 + P(H)F!\left(\tfrac32 r\right) + P(T)F!\left(\tfrac12 r\right) \right),$$
where $S(r)$ is the stopping value above.
Finally,
$$V(f,u)=V(f-1,u)+F!\left(\frac{u}{f}\right).$$
Boundary conditions:
$$V(0,u)=20u,\qquad V(f,0)=20f,$$
because once all remaining coins are known, every future guess is correct with no tossing required.
Verification for $N=1$
Running the DP gives
$$S(1)=20.558587\ldots$$
which matches the problem statement:
$$20.558591$$
(after rounding to 6 decimals).
Python implementation
from math import *
def solve(N, depth=160):
V = [[0.0]*(N+1) for _ in range(N+1)]
pow2 = [1.0]
pow3 = [1.0]
for _ in range(depth + 2):
pow2.append(pow2[-1] * 2.0)
pow3.append(pow3[-1] * 3.0)
for total in range(1, 2*N + 1):
lo = max(0, total - N)
hi = min(N, total)
for f in range(lo, hi + 1):
u = total - f
if f == 0:
V[f][u] = 20.0 * u
continue
if u == 0:
V[f][u] = 20.0 * f
continue
A = V[f][u-1] - V[f-1][u]
r0 = u / f
# H[a][b]:
# after b tosses with a heads
H = [[0.0]*(depth+2) for _ in range(depth+2)]
for b in range(depth, -1, -1):
for a in range(b, -1, -1):
r = r0 * pow3[a] / pow2[b]
q = r / (1.0 + r)
stop = max(
q*(70.0 + A) - 50.0,
q*(A - 70.0) + 20.0
)
if b < depth:
pH = (2.0 + 3.0*r) / (4.0*(1.0 + r))
cont = (
-1.0
+ pH * H[a+1][b+1]
+ (1.0-pH) * H[a][b+1]
)
stop = max(stop, cont)
H[a][b] = stop
V[f][u] = V[f-1][u] + H[0][0]
return V[N][N]
ans = solve(50)
print(f"{ans:.6f}")
The computation converges rapidly with increasing truncation depth:
- depth 120: $130.313478$
- depth 140: $130.313495$
- depth 160: $130.313496$
so the answer is stable to 6 decimal places.
Answer: 130.313496