Project Euler Problem 232
Two players share an unbiased coin and take it in turns to play The Race.
Solution
Answer: 0.83648556
Let
- $P(a,b)$ = probability that Player 2 eventually wins when it is Player 1’s turn, with scores
$$\text{Player 1}=a,\qquad \text{Player 2}=b.$$
- $Q(a,b)$ = probability that Player 2 eventually wins when it is Player 2’s turn.
The game ends immediately if either player reaches $100$ points:
$$P(a,b)=Q(a,b)=0 \quad (a\ge 100),$$
$$P(a,b)=Q(a,b)=1 \quad (b\ge 100).$$
We seek
$$P(0,0).$$
1. Mathematical analysis
Player 1’s move
Player 1 tosses one fair coin.
- Heads ($\frac12$): Player 1 gains 1 point.
- Tails ($\frac12$): no point.
Afterward it becomes Player 2’s turn.
Therefore
$$P(a,b)=\frac12,Q(a+1,b)+\frac12,Q(a,b).$$
Player 2’s move
Player 2 chooses an integer $T\ge1$.
He tosses the coin $T$ times.
- Probability all heads:
$$2^{-T}.$$
- If successful, he gains
$$2^{T-1}$$
points.
- Otherwise he gains nothing.
After Player 2’s move it becomes Player 1’s turn again.
Hence, for a fixed $T$,
$$Q_T(a,b) = 2^{-T} P(a,b+2^{T-1}) + (1-2^{-T})P(a,b).$$
Since Player 2 always plays optimally,
$$Q(a,b)=\max_{T\ge1} Q_T(a,b).$$
Eliminating the mutual recursion
Substitute
$$P(a,b)=\frac12Q(a+1,b)+\frac12Q(a,b)$$
into the formula for $Q_T$.
Let
$$p=2^{-T},\qquad g=2^{T-1}.$$
Then
$$Q_T(a,b) = p,P(a,b+g)+(1-p)P(a,b).$$
Also,
$$P(a,b) = \frac12Q(a+1,b)+\frac12Q(a,b).$$
Substituting:
$$Q_T(a,b) = p,P(a,b+g) + (1-p)\left(\frac12Q(a+1,b)+\frac12Q_T(a,b)\right).$$
Solve for $Q_T(a,b)$:
$$Q_T(a,b)\left(1-\frac12(1-p)\right) = p,P(a,b+g)+\frac12(1-p)Q(a+1,b).$$
Thus
$$Q_T(a,b) = \frac{ p,P(a,b+g)+\frac12(1-p)Q(a+1,b) }{ 1-\frac12(1-p) }.$$
We compute this dynamically for all states.
Why only small $T$ values matter
The largest useful reward is the first one that can immediately win the game.
Since the target is $100$,
$$2^{T-1}\ge100$$
already guarantees victory upon success.
The smallest such $T$ is $8$ because
$$2^7=128.$$
So checking $T=1,\dots,8$ is sufficient.
2. Python implementation
N = 100
# P[a][b] = probability Player 2 wins
# when it is Player 1's turn
#
# Q[a][b] = probability Player 2 wins
# when it is Player 2's turn
P = [[0.0 for _ in range(N + 129)] for _ in range(N + 1)]
Q = [[0.0 for _ in range(N + 129)] for _ in range(N + 1)]
# Terminal states:
# If Player 2 already has >=100 points, he has won.
for a in range(N + 1):
for b in range(100, N + 129):
P[a][b] = 1.0
Q[a][b] = 1.0
# Fill tables backwards
for a in range(99, -1, -1):
for b in range(99, -1, -1):
# Q(a+1,b), unless Player 1 already wins
if a + 1 >= 100:
next_q = 0.0
else:
next_q = Q[a + 1][b]
best = 0.0
# Try all meaningful T values
for T in range(1, 9):
p = 2.0 ** (-T)
gain = 2 ** (T - 1)
# P(a,b+gain)
if b + gain >= 100:
success_value = 1.0
else:
success_value = P[a][b + gain]
# Derived closed-form recurrence
numerator = (
p * success_value
+ 0.5 * (1 - p) * next_q
)
denominator = 1 - 0.5 * (1 - p)
value = numerator / denominator
best = max(best, value)
Q[a][b] = best
# Player 1 turn recurrence
P[a][b] = 0.5 * next_q + 0.5 * Q[a][b]
print("{:.8f}".format(P[0][0]))
3. Code walkthrough
Table setup
P = [[...]]
Q = [[...]]
We store all probabilities for every score pair $(a,b)$.
Terminal conditions
if b >= 100:
probability = 1
If Player 2 already reached 100, he has already won.
Similarly, states with $a\ge100$ are treated as losing states for Player 2.
Backward dynamic programming
We iterate downward from $99$ to $0$ because every recurrence depends only on larger scores.
Trying every strategy $T$
for T in range(1, 9):
Player 2 checks every possible number of tosses and chooses the one maximizing his winning probability.
Closed-form recurrence
value = numerator / denominator
This is exactly the algebraically simplified formula derived earlier.
Final answer
print("{:.8f}".format(P[0][0]))
This outputs the probability that Player 2 wins when the game starts.
The computed value is
$$0.8364855558\ldots$$
which rounds to
$$0.83648556.$$
4. Final verification
The result is reasonable:
- Player 2 can choose aggressive strategies dynamically.
- Large rewards compensate for low success probabilities.
- Numerical value $>0.8$ indicates Player 2’s adaptive strategy is extremely strong despite Player 1 moving first.
Everything is internally consistent and stable under recomputation.
Answer: 0.83648556