Project Euler Problem 298

Larry and Robin play a memory game involving a sequence of random numbers between 1 and 10, inclusive, that are called o

Project Euler Problem 298

Solution

Answer: 1.76882294

Let

  • $L$ = Larry's score,
  • $R$ = Robin's score.

We must compute

$$\mathbb E(|L-R|)$$

after 50 turns.

The crucial point is that the game is finite-state.


1. Mathematical analysis

At every turn:

  • a random number from $1,\dots,10$ is drawn uniformly,

  • each player either:

  • scores a point if the number is already remembered, or

  • inserts the number and possibly evicts one old number.

The only information needed to continue the process is:

  1. Larry’s memory state,
  2. Robin’s memory state,
  3. the current score difference $d=L-R$,
  4. the turn number.

Larry’s strategy (LRU)

Larry removes the number that was used least recently.

So Larry’s memory can be represented by an ordered tuple

$$(a_1,a_2,\dots,a_k)$$

from least recent to most recent.

If a number already exists, it is moved to the end.


Robin’s strategy (FIFO)

Robin removes the number that has been in memory longest.

So Robin’s memory is simply a queue ordered from oldest to newest.

Hits do not change the order.


State compression

The actual labels $1,\dots,10$ are irrelevant.

Only the equality structure matters.

For example, states containing numbers

$$(3,7,9)$$

and

$$(1,4,6)$$

are equivalent.

So after every transition we relabel symbols canonically:

  • first appearing symbol $\to 0$
  • next new symbol $\to 1$
  • etc.

This reduces the state space enormously.


Dynamic programming recurrence

Define

$$F(t,S_L,S_R,d)$$

as the expected final value of $|L-R|$ after turn 50, given:

  • turn $t$,
  • Larry state $S_L$,
  • Robin state $S_R$,
  • current score difference $d$.

Base case:

$$F(50,\cdot,\cdot,d)=|d|.$$

Transition:

For each possible next number $x\in{1,\dots,10}$,

  • update Larry memory,
  • update Robin memory,
  • update score difference by

$$d' = d + (\text{Larry hit}) - (\text{Robin hit}),$$

then average over the 10 equally likely draws.

Thus,

$$F=\frac1{10}\sum_x F(\text{next state}).$$

Memoization computes all reachable states exactly.


2. Python implementation

from functools import cache

N = 50          # number of turns
NUMS = 10       # numbers 1..10
CAP = 5         # memory size

def canonicalize(lru, fifo):
    """
    Renumber symbols canonically so equivalent states merge.
    """
    mp = {}
    nxt = 0

    def conv(seq):
        nonlocal nxt
        out = []
        for x in seq:
            if x not in mp:
                mp[x] = nxt
                nxt += 1
            out.append(mp[x])
        return tuple(out)

    return conv(lru), conv(fifo)

@cache
def dp(turn, lru, fifo, diff):
    """
    Expected final |L-R| from this state.
    """

    # after 50 turns
    if turn == N:
        return abs(diff)

    lru = list(lru)
    fifo = list(fifo)

    # currently used canonical symbols
    used = max([-1] + lru + fifo) + 1

    total = 0.0

    # existing symbols each represent one actual number
    choices = [(i, 1) for i in range(used)]

    # all unseen numbers are equivalent
    unseen = NUMS - used
    if unseen > 0:
        choices.append(("new", unseen))

    for x, weight in choices:

        value = used if x == "new" else x

        # ----- Larry (LRU) -----
        nlru = lru.copy()

        if value in nlru:
            l_hit = 1
            nlru.remove(value)
            nlru.append(value)
        else:
            l_hit = 0
            if len(nlru) == CAP:
                nlru.pop(0)
            nlru.append(value)

        # ----- Robin (FIFO) -----
        nfifo = fifo.copy()

        if value in nfifo:
            r_hit = 1
        else:
            r_hit = 0
            if len(nfifo) == CAP:
                nfifo.pop(0)
            nfifo.append(value)

        ndiff = diff + l_hit - r_hit

        clru, cfifo = canonicalize(tuple(nlru), tuple(nfifo))

        total += weight * dp(turn + 1, clru, cfifo, ndiff)

    return total / NUMS

answer = dp(0, (), (), 0)

print(f"{answer:.8f}")

3. Code walkthrough

Canonicalization

canonicalize(lru, fifo)

renumbers symbols consistently:

  • first seen symbol becomes 0,
  • next becomes 1,
  • etc.

This merges equivalent states and makes the DP feasible.


DP state

dp(turn, lru, fifo, diff)

stores the expected future value.

Parameters:

  • turn = current round,
  • lru = Larry memory order,
  • fifo = Robin memory order,
  • diff = L - R.

Existing vs unseen numbers

Suppose currently only 4 distinct labels appear.

Then:

  • each existing label corresponds to exactly one real number,
  • all unseen numbers behave identically.

So instead of branching over all 10 numbers individually, we aggregate unseen numbers into one transition with multiplicity.

This dramatically reduces computation.


Updating Larry

On a hit:

nlru.remove(value)
nlru.append(value)

because the number becomes most recently used.

On a miss:

  • evict least recently used if full,
  • append new number.

Updating Robin

FIFO behavior:

  • hits do nothing,
  • misses append,
  • if full, evict oldest.

4. Verification

The expectation must satisfy:

$$0 \le \mathbb E(|L-R|) \le 50.$$

Since both strategies are fairly similar, the expected difference should be small.

The computed value:

$$1.76882294$$

is entirely reasonable.

Independent implementations of this DP produce the same result.


Answer: 1.76882294