Project Euler Problem 232

Two players share an unbiased coin and take it in turns to play The Race.

Project Euler Problem 232

Solution

Answer: 0.83648556

Let

  • $P(a,b)$ = probability that Player 2 eventually wins when it is Player 1’s turn, with scores

$$\text{Player 1}=a,\qquad \text{Player 2}=b.$$

  • $Q(a,b)$ = probability that Player 2 eventually wins when it is Player 2’s turn.

The game ends immediately if either player reaches $100$ points:

$$P(a,b)=Q(a,b)=0 \quad (a\ge 100),$$

$$P(a,b)=Q(a,b)=1 \quad (b\ge 100).$$

We seek

$$P(0,0).$$


1. Mathematical analysis

Player 1’s move

Player 1 tosses one fair coin.

  • Heads ($\frac12$): Player 1 gains 1 point.
  • Tails ($\frac12$): no point.

Afterward it becomes Player 2’s turn.

Therefore

$$P(a,b)=\frac12,Q(a+1,b)+\frac12,Q(a,b).$$


Player 2’s move

Player 2 chooses an integer $T\ge1$.

He tosses the coin $T$ times.

  • Probability all heads:

$$2^{-T}.$$

  • If successful, he gains

$$2^{T-1}$$

points.

  • Otherwise he gains nothing.

After Player 2’s move it becomes Player 1’s turn again.

Hence, for a fixed $T$,

$$Q_T(a,b) = 2^{-T} P(a,b+2^{T-1}) + (1-2^{-T})P(a,b).$$

Since Player 2 always plays optimally,

$$Q(a,b)=\max_{T\ge1} Q_T(a,b).$$


Eliminating the mutual recursion

Substitute

$$P(a,b)=\frac12Q(a+1,b)+\frac12Q(a,b)$$

into the formula for $Q_T$.

Let

$$p=2^{-T},\qquad g=2^{T-1}.$$

Then

$$Q_T(a,b) = p,P(a,b+g)+(1-p)P(a,b).$$

Also,

$$P(a,b) = \frac12Q(a+1,b)+\frac12Q(a,b).$$

Substituting:

$$Q_T(a,b) = p,P(a,b+g) + (1-p)\left(\frac12Q(a+1,b)+\frac12Q_T(a,b)\right).$$

Solve for $Q_T(a,b)$:

$$Q_T(a,b)\left(1-\frac12(1-p)\right) = p,P(a,b+g)+\frac12(1-p)Q(a+1,b).$$

Thus

$$Q_T(a,b) = \frac{ p,P(a,b+g)+\frac12(1-p)Q(a+1,b) }{ 1-\frac12(1-p) }.$$

We compute this dynamically for all states.


Why only small $T$ values matter

The largest useful reward is the first one that can immediately win the game.

Since the target is $100$,

$$2^{T-1}\ge100$$

already guarantees victory upon success.

The smallest such $T$ is $8$ because

$$2^7=128.$$

So checking $T=1,\dots,8$ is sufficient.


2. Python implementation

N = 100

# P[a][b] = probability Player 2 wins
#           when it is Player 1's turn
#
# Q[a][b] = probability Player 2 wins
#           when it is Player 2's turn

P = [[0.0 for _ in range(N + 129)] for _ in range(N + 1)]
Q = [[0.0 for _ in range(N + 129)] for _ in range(N + 1)]

# Terminal states:
# If Player 2 already has >=100 points, he has won.
for a in range(N + 1):
    for b in range(100, N + 129):
        P[a][b] = 1.0
        Q[a][b] = 1.0

# Fill tables backwards
for a in range(99, -1, -1):
    for b in range(99, -1, -1):

        # Q(a+1,b), unless Player 1 already wins
        if a + 1 >= 100:
            next_q = 0.0
        else:
            next_q = Q[a + 1][b]

        best = 0.0

        # Try all meaningful T values
        for T in range(1, 9):

            p = 2.0 ** (-T)
            gain = 2 ** (T - 1)

            # P(a,b+gain)
            if b + gain >= 100:
                success_value = 1.0
            else:
                success_value = P[a][b + gain]

            # Derived closed-form recurrence
            numerator = (
                p * success_value
                + 0.5 * (1 - p) * next_q
            )

            denominator = 1 - 0.5 * (1 - p)

            value = numerator / denominator

            best = max(best, value)

        Q[a][b] = best

        # Player 1 turn recurrence
        P[a][b] = 0.5 * next_q + 0.5 * Q[a][b]

print("{:.8f}".format(P[0][0]))

3. Code walkthrough

Table setup

P = [[...]]
Q = [[...]]

We store all probabilities for every score pair $(a,b)$.


Terminal conditions

if b >= 100:
    probability = 1

If Player 2 already reached 100, he has already won.

Similarly, states with $a\ge100$ are treated as losing states for Player 2.


Backward dynamic programming

We iterate downward from $99$ to $0$ because every recurrence depends only on larger scores.


Trying every strategy $T$

for T in range(1, 9):

Player 2 checks every possible number of tosses and chooses the one maximizing his winning probability.


Closed-form recurrence

value = numerator / denominator

This is exactly the algebraically simplified formula derived earlier.


Final answer

print("{:.8f}".format(P[0][0]))

This outputs the probability that Player 2 wins when the game starts.

The computed value is

$$0.8364855558\ldots$$

which rounds to

$$0.83648556.$$


4. Final verification

The result is reasonable:

  • Player 2 can choose aggressive strategies dynamically.
  • Large rewards compensate for low success probabilities.
  • Numerical value $>0.8$ indicates Player 2’s adaptive strategy is extremely strong despite Player 1 moving first.

Everything is internally consistent and stable under recomputation.


Answer: 0.83648556