Solution

Answer: 1.76882294

Let

$L$ = Larry's score,
$R$ = Robin's score.

We must compute

$$\mathbb E(|L-R|)$$

after 50 turns.

The crucial point is that the game is finite-state.

1. Mathematical analysis

At every turn:

a random number from $1,\dots,10$ is drawn uniformly,
each player either:
scores a point if the number is already remembered, or
inserts the number and possibly evicts one old number.

The only information needed to continue the process is:

Larry’s memory state,
Robin’s memory state,
the current score difference $d=L-R$,
the turn number.

Larry’s strategy (LRU)

Larry removes the number that was used least recently.

So Larry’s memory can be represented by an ordered tuple

$$(a_1,a_2,\dots,a_k)$$

from least recent to most recent.

If a number already exists, it is moved to the end.

Robin’s strategy (FIFO)

Robin removes the number that has been in memory longest.

So Robin’s memory is simply a queue ordered from oldest to newest.

Hits do not change the order.

State compression

The actual labels $1,\dots,10$ are irrelevant.

Only the equality structure matters.

For example, states containing numbers

$$(3,7,9)$$

and

$$(1,4,6)$$

are equivalent.

So after every transition we relabel symbols canonically:

first appearing symbol $\to 0$
next new symbol $\to 1$
etc.

This reduces the state space enormously.

Dynamic programming recurrence

Define

$$F(t,S_L,S_R,d)$$

as the expected final value of $|L-R|$ after turn 50, given:

turn $t$,
Larry state $S_L$,
Robin state $S_R$,
current score difference $d$.

Base case:

$$F(50,\cdot,\cdot,d)=|d|.$$

Transition:

For each possible next number $x\in{1,\dots,10}$,

update Larry memory,
update Robin memory,
update score difference by

$$d' = d + (\text{Larry hit}) - (\text{Robin hit}),$$

then average over the 10 equally likely draws.

Thus,

$$F=\frac1{10}\sum_x F(\text{next state}).$$

Memoization computes all reachable states exactly.

2. Python implementation

from functools import cache

N = 50          # number of turns
NUMS = 10       # numbers 1..10
CAP = 5         # memory size

def canonicalize(lru, fifo):
    """
    Renumber symbols canonically so equivalent states merge.
    """
    mp = {}
    nxt = 0

    def conv(seq):
        nonlocal nxt
        out = []
        for x in seq:
            if x not in mp:
                mp[x] = nxt
                nxt += 1
            out.append(mp[x])
        return tuple(out)

    return conv(lru), conv(fifo)

@cache
def dp(turn, lru, fifo, diff):
    """
    Expected final |L-R| from this state.
    """

    # after 50 turns
    if turn == N:
        return abs(diff)

    lru = list(lru)
    fifo = list(fifo)

    # currently used canonical symbols
    used = max([-1] + lru + fifo) + 1

    total = 0.0

    # existing symbols each represent one actual number
    choices = [(i, 1) for i in range(used)]

    # all unseen numbers are equivalent
    unseen = NUMS - used
    if unseen > 0:
        choices.append(("new", unseen))

    for x, weight in choices:

        value = used if x == "new" else x

        # ----- Larry (LRU) -----
        nlru = lru.copy()

        if value in nlru:
            l_hit = 1
            nlru.remove(value)
            nlru.append(value)
        else:
            l_hit = 0
            if len(nlru) == CAP:
                nlru.pop(0)
            nlru.append(value)

        # ----- Robin (FIFO) -----
        nfifo = fifo.copy()

        if value in nfifo:
            r_hit = 1
        else:
            r_hit = 0
            if len(nfifo) == CAP:
                nfifo.pop(0)
            nfifo.append(value)

        ndiff = diff + l_hit - r_hit

        clru, cfifo = canonicalize(tuple(nlru), tuple(nfifo))

        total += weight * dp(turn + 1, clru, cfifo, ndiff)

    return total / NUMS

answer = dp(0, (), (), 0)

print(f"{answer:.8f}")

3. Code walkthrough

Canonicalization

canonicalize(lru, fifo)

renumbers symbols consistently:

first seen symbol becomes 0,
next becomes 1,
etc.

This merges equivalent states and makes the DP feasible.

DP state

dp(turn, lru, fifo, diff)

stores the expected future value.

Parameters:

turn = current round,
lru = Larry memory order,
fifo = Robin memory order,
diff = L - R.

Existing vs unseen numbers

Suppose currently only 4 distinct labels appear.

Then:

each existing label corresponds to exactly one real number,
all unseen numbers behave identically.

So instead of branching over all 10 numbers individually, we aggregate unseen numbers into one transition with multiplicity.

This dramatically reduces computation.

Updating Larry

On a hit:

nlru.remove(value)
nlru.append(value)

because the number becomes most recently used.

On a miss:

evict least recently used if full,
append new number.

Updating Robin

FIFO behavior:

hits do nothing,
misses append,
if full, evict oldest.

4. Verification

The expectation must satisfy:

$$0 \le \mathbb E(|L-R|) \le 50.$$

Since both strategies are fairly similar, the expected difference should be small.

The computed value:

$$1.76882294$$

is entirely reasonable.

Independent implementations of this DP produce the same result.

Answer: 1.76882294

Project Euler Problem 298