Project Euler Problem 458

Consider the alphabet A made out of the letters of the word "text{project}": A=text c,text e,text j,text o,text p,text r

Project Euler Problem 458

Solution

Answer: 423341841

Let the alphabet be the 7 distinct letters of project:

$$A={c,e,j,o,p,r,t}$$

A forbidden substring is any permutation of “project”, i.e. any length-7 block containing all 7 letters exactly once.

1. Mathematical analysis

Key observation

Since the alphabet itself has exactly 7 letters, a length-7 substring is forbidden iff all 7 letters are distinct.

So the problem becomes:

Count strings of length $n$ over a 7-letter alphabet such that no consecutive block of length 7 has all distinct letters.


State compression

The only information needed to determine whether appending a letter creates a forbidden block is:

the length of the longest suffix of distinct consecutive letters.

Define state $k$ ($0\le k\le 6$) as:

  • $k=0$: empty string,
  • $k\ge1$: the current string ends with a suffix of $k$ distinct letters.

We never allow state 7 because that would mean a forbidden permutation appeared.

Transition rule

Suppose we are in state $k\ge1$, with suffix

$$x_1x_2\cdots x_k$$

(all distinct).

If we append:

  • a new letter not in the suffix:

suffix length becomes $k+1$, with $7-k$ choices.

  • one of the existing suffix letters $x_i$:

the new distinct suffix becomes

$$x_{i+1}\cdots x_kx_i$$

whose length is

$$k-i+1.$$

Thus from state $k$:

  • exactly one transition to each state $1,2,\dots,k$,
  • and $7-k$ transitions to state $k+1$ (if $k<6$).

State $6$ cannot transition to $7$, because that would create a forbidden substring.

This yields a $7\times7$ transition matrix, so

$$T(n)$$

can be computed using fast matrix exponentiation in

$$O(7^3\log n).$$

A quick check:

$$T(7)=7^7-7!=818503,$$

matching the problem statement.


2. Python implementation

MOD = 10**9
SIZE = 7

# Build transition matrix
M = [[0] * SIZE for _ in range(SIZE)]

for k in range(SIZE):
    if k == 0:
        # First character: 7 choices -> state 1
        M[1][0] = 7
    else:
        # Repeating one of the suffix letters
        for new_len in range(1, k + 1):
            M[new_len][k] += 1

        # Add a new distinct letter
        if k < 6:
            M[k + 1][k] += (7 - k)

def mat_mul(A, B):
    """Matrix multiplication mod MOD."""
    n = len(A)
    C = [[0] * n for _ in range(n)]

    for i in range(n):
        for k in range(n):
            if A[i][k]:
                a = A[i][k]
                for j in range(n):
                    if B[k][j]:
                        C[i][j] = (C[i][j] + a * B[k][j]) % MOD

    return C

def mat_pow(M, exp):
    """Binary exponentiation of matrix."""
    n = len(M)
    R = [[1 if i == j else 0 for j in range(n)] for i in range(n)]

    while exp:
        if exp & 1:
            R = mat_mul(R, M)
        M = mat_mul(M, M)
        exp >>= 1

    return R

def T(n):
    # Initial vector: empty string in state 0
    v = [0] * SIZE
    v[0] = 1

    P = mat_pow(M, n)

    result = [0] * SIZE
    for i in range(SIZE):
        total = 0
        for j in range(SIZE):
            total += P[i][j] * v[j]
        result[i] = total % MOD

    return sum(result) % MOD

# Verification
print(T(7))          # 818503
print(T(10**12))     # final answer

3. Code walkthrough

  1. Build transition matrix

Encodes how the distinct suffix length changes after appending a character. 2. Matrix multiplication (mat_mul)

Multiplies matrices modulo $10^9$, since only the last 9 digits are needed. 3. Fast exponentiation (mat_pow)

Computes

$$M^n$$

in logarithmic time.

  1. Initial vector

Starts in state 0 (empty string). 2. Final sum

Sum all valid ending states to get $T(n)$.

Small example check

For $n=7$:

T(7) = 818503

which matches:

$$7^7 - 7! = 823543 - 5040 = 818503.$$

So the automaton is behaving correctly.


4. Final verification

  • We only track 7 states, so matrix exponentiation is exact and efficient.
  • The $n=7$ sanity check matches perfectly.
  • Modulo $10^9$ arithmetic is valid because only the last 9 digits are requested.

Answer: 423341841