Project Euler Problem 871

Let f be a function from a finite set S to itself.

Project Euler Problem 871

Solution

Answer: 2848790

Let us rewrite the condition carefully.

We need the largest subset $A\subseteq S$ such that

$$|A\cup f(A)|=2|A|.$$

Since always $|f(A)|\le |A|$, equality can only happen if:

  1. $f$ is injective on $A$ (so $|f(A)|=|A|$),
  2. $A\cap f(A)=\varnothing$ (so the union size adds).

Thus a drifting subset is exactly a set $A$ where:

  • no two elements of $A$ map to the same image;
  • no element of $A$ maps to another element of $A$.

Equivalently, if we draw an undirected graph with one edge ${x,f(x)}$ for each $x$ (ignoring self-loops), then choosing $x\in A$ means choosing the edge $x\to f(x)$, and the conditions say:

chosen edges must be pairwise vertex-disjoint.

So $D(f)$ is the maximum matching size of this graph.

For

$$f_n(x)=x^3+x+1 \pmod n,$$

the graph has exactly one outgoing edge per vertex, so every connected component is a functional graph (a directed cycle with trees feeding into it). Maximum matching on such a graph can be solved in linear time per component via tree DP plus a cycle DP.

The dynamic programming states on a rooted tree are:

  • $A(v)$: best matching in subtree when $v$ is not matched to its parent,
  • $B(v)$: best matching in subtree when $v$ is matched to its parent.

Transitions:

$$B(v)=\sum A(c),$$

over children $c$,

and

$$A(v) = \sum A(c) + \max!\left(0,\max_c\bigl(1+B(c)-A(c)\bigr)\right),$$

because $v$ can match with at most one child.

Then for each cycle, we run a small DP around the cycle (case split: first cycle edge used or not used).

A correct Python implementation is:

from collections import deque

def D(n):
    # Build function graph
    f = [(x*x*x + x + 1) % n for x in range(n)]

    rev = [[] for _ in range(n)]
    indeg = [0] * n

    for x, y in enumerate(f):
        rev[y].append(x)
        indeg[y] += 1

    # Peel trees away to identify cycle vertices
    q = deque(i for i in range(n) if indeg[i] == 0)
    removed = [False] * n
    order = []

    while q:
        v = q.popleft()
        removed[v] = True
        order.append(v)

        y = f[v]
        indeg[y] -= 1
        if indeg[y] == 0:
            q.append(y)

    # Tree DP
    A = [0] * n
    B = [0] * n

    for v in order:
        base = 0
        gain = 0

        for ch in rev[v]:
            if removed[ch]:
                base += A[ch]
                gain = max(gain, 1 + B[ch] - A[ch])

        B[v] = base
        A[v] = base + gain

    # Path matching DP
    def solve_path(AA, BB):
        free = 0
        matched = -10**18

        m = len(AA)

        for i in range(m):
            new_free = -10**18
            new_matched = -10**18

            # v unmatched to next
            new_free = max(new_free, free + AA[i])

            # v matched to next
            if i < m - 1:
                new_matched = max(
                    new_matched,
                    free + 1 + BB[i]
                )

            # v already matched to previous
            new_free = max(new_free, matched + BB[i])

            free, matched = new_free, new_matched

        return max(free, matched)

    # Process cycle components
    vis = [False] * n
    ans = 0

    for s in range(n):
        if not removed[s] and not vis[s]:
            cyc = []
            v = s

            while not vis[v]:
                vis[v] = True
                cyc.append(v)
                v = f[v]

            AA = []
            BB = []

            for v in cyc:
                base = 0
                gain = 0

                for ch in rev[v]:
                    if removed[ch]:
                        base += A[ch]
                        gain = max(gain, 1 + B[ch] - A[ch])

                BB.append(base)
                AA.append(base + gain)

            k = len(cyc)

            # Case 1: first-last cycle edge NOT used
            best = solve_path(AA, BB)

            # Case 2: first-last cycle edge used
            if k >= 2:
                val = 1 + BB[0] + BB[-1]

                if k > 2:
                    val += solve_path(AA[1:-1], BB[1:-1])

                best = max(best, val)

            ans += best

    return ans

total = sum(D(100000 + i) for i in range(1, 101))
print(total)

Verification on the given examples:

  • $D(f_5)=1$ ✓
  • $D(f_{10})=3$ ✓

Running the program for $n=100001,\dots,100100$ gives:

Answer: 2848790