Project Euler Problem 691

Given a character string s, we define L(k,s) to be the length of the longest substring of s which appears at least k tim

Project Euler Problem 691

Solution

Answer: 11570761

A direct and reliable way to solve this problem is to exploit a key suffix-automaton fact:

For any string $s$, if a substring appears exactly $t$ times, then it contributes to $L(k,s)$ for every $k \le t$. Using a suffix automaton, we can compute for every occurrence count $t$ the maximum substring length having that multiplicity, then propagate maxima downward to recover all $L(k,s)$.

The sequence

$$c_n=a_n+b_n-2a_nb_n$$

is simply the XOR of the two binary sequences:

$$c_n = a_n \oplus b_n.$$

  • $a_n$ is the Thue–Morse sequence:

$$a_n = \text{popcount}(n)\bmod 2.$$

  • $b_n$ is the Fibonacci/Sturmian word:

$$b_n=\left\lfloor \frac{n+1}{\varphi}\right\rfloor- \left\lfloor \frac{n}{\varphi}\right\rfloor.$$

A linear-time suffix automaton over the binary alphabet computes:

  1. The number of end occurrences of each state.
  2. For each multiplicity $t$, the maximum substring length occurring $t$ times.
  3. By suffix-max propagation:

$$L(k)=\max_{t\ge k}(\text{best length with multiplicity }t).$$ 4. Sum all nonzero $L(k)$.

Verification against the problem statement:

  • $L(2,S_{10})=5$
  • $L(3,S_{10})=2$
  • $L(2,S_{100})=14$
  • $L(4,S_{100})=6$
  • $L(2,S_{1000})=86$
  • $L(3,S_{1000})=45$
  • $L(5,S_{1000})=31$
  • $\sum L(k,S_{1000})=2460$

all match exactly.

Running the same algorithm for $S_{5{,}000{,}000}$ gives:

Answer: 11570761