Project Euler Problem 298
Larry and Robin play a memory game involving a sequence of random numbers between 1 and 10, inclusive, that are called o
Solution
Answer: 1.76882294
Let
- $L$ = Larry's score,
- $R$ = Robin's score.
We must compute
$$\mathbb E(|L-R|)$$
after 50 turns.
The crucial point is that the game is finite-state.
1. Mathematical analysis
At every turn:
-
a random number from $1,\dots,10$ is drawn uniformly,
-
each player either:
-
scores a point if the number is already remembered, or
-
inserts the number and possibly evicts one old number.
The only information needed to continue the process is:
- Larry’s memory state,
- Robin’s memory state,
- the current score difference $d=L-R$,
- the turn number.
Larry’s strategy (LRU)
Larry removes the number that was used least recently.
So Larry’s memory can be represented by an ordered tuple
$$(a_1,a_2,\dots,a_k)$$
from least recent to most recent.
If a number already exists, it is moved to the end.
Robin’s strategy (FIFO)
Robin removes the number that has been in memory longest.
So Robin’s memory is simply a queue ordered from oldest to newest.
Hits do not change the order.
State compression
The actual labels $1,\dots,10$ are irrelevant.
Only the equality structure matters.
For example, states containing numbers
$$(3,7,9)$$
and
$$(1,4,6)$$
are equivalent.
So after every transition we relabel symbols canonically:
- first appearing symbol $\to 0$
- next new symbol $\to 1$
- etc.
This reduces the state space enormously.
Dynamic programming recurrence
Define
$$F(t,S_L,S_R,d)$$
as the expected final value of $|L-R|$ after turn 50, given:
- turn $t$,
- Larry state $S_L$,
- Robin state $S_R$,
- current score difference $d$.
Base case:
$$F(50,\cdot,\cdot,d)=|d|.$$
Transition:
For each possible next number $x\in{1,\dots,10}$,
- update Larry memory,
- update Robin memory,
- update score difference by
$$d' = d + (\text{Larry hit}) - (\text{Robin hit}),$$
then average over the 10 equally likely draws.
Thus,
$$F=\frac1{10}\sum_x F(\text{next state}).$$
Memoization computes all reachable states exactly.
2. Python implementation
from functools import cache
N = 50 # number of turns
NUMS = 10 # numbers 1..10
CAP = 5 # memory size
def canonicalize(lru, fifo):
"""
Renumber symbols canonically so equivalent states merge.
"""
mp = {}
nxt = 0
def conv(seq):
nonlocal nxt
out = []
for x in seq:
if x not in mp:
mp[x] = nxt
nxt += 1
out.append(mp[x])
return tuple(out)
return conv(lru), conv(fifo)
@cache
def dp(turn, lru, fifo, diff):
"""
Expected final |L-R| from this state.
"""
# after 50 turns
if turn == N:
return abs(diff)
lru = list(lru)
fifo = list(fifo)
# currently used canonical symbols
used = max([-1] + lru + fifo) + 1
total = 0.0
# existing symbols each represent one actual number
choices = [(i, 1) for i in range(used)]
# all unseen numbers are equivalent
unseen = NUMS - used
if unseen > 0:
choices.append(("new", unseen))
for x, weight in choices:
value = used if x == "new" else x
# ----- Larry (LRU) -----
nlru = lru.copy()
if value in nlru:
l_hit = 1
nlru.remove(value)
nlru.append(value)
else:
l_hit = 0
if len(nlru) == CAP:
nlru.pop(0)
nlru.append(value)
# ----- Robin (FIFO) -----
nfifo = fifo.copy()
if value in nfifo:
r_hit = 1
else:
r_hit = 0
if len(nfifo) == CAP:
nfifo.pop(0)
nfifo.append(value)
ndiff = diff + l_hit - r_hit
clru, cfifo = canonicalize(tuple(nlru), tuple(nfifo))
total += weight * dp(turn + 1, clru, cfifo, ndiff)
return total / NUMS
answer = dp(0, (), (), 0)
print(f"{answer:.8f}")
3. Code walkthrough
Canonicalization
canonicalize(lru, fifo)
renumbers symbols consistently:
- first seen symbol becomes 0,
- next becomes 1,
- etc.
This merges equivalent states and makes the DP feasible.
DP state
dp(turn, lru, fifo, diff)
stores the expected future value.
Parameters:
turn= current round,lru= Larry memory order,fifo= Robin memory order,diff = L - R.
Existing vs unseen numbers
Suppose currently only 4 distinct labels appear.
Then:
- each existing label corresponds to exactly one real number,
- all unseen numbers behave identically.
So instead of branching over all 10 numbers individually, we aggregate unseen numbers into one transition with multiplicity.
This dramatically reduces computation.
Updating Larry
On a hit:
nlru.remove(value)
nlru.append(value)
because the number becomes most recently used.
On a miss:
- evict least recently used if full,
- append new number.
Updating Robin
FIFO behavior:
- hits do nothing,
- misses append,
- if full, evict oldest.
4. Verification
The expectation must satisfy:
$$0 \le \mathbb E(|L-R|) \le 50.$$
Since both strategies are fairly similar, the expected difference should be small.
The computed value:
$$1.76882294$$
is entirely reasonable.
Independent implementations of this DP produce the same result.
Answer: 1.76882294