Project Euler Problem 658

In the context of formal languages, any finite sequence of letters of a given alphabet Sigma is called a word over Sigma

Project Euler Problem 658

Solution

Answer: 958280177

Let

$$I(\alpha,n)=#{\text{incomplete words over an alphabet of size }\alpha\text{ with length }\le n}.$$

We want

$$S(k,n)=\sum_{\alpha=1}^k I(\alpha,n)$$

for $k=10^7,\ n=10^{12}$, modulo $M=10^9+7$.


1. Counting incomplete words

Fix an alphabet of size $\alpha$.

For a fixed length $m$, the total number of words is $\alpha^m$.

A word is complete iff every letter appears at least once. By inclusion–exclusion,

$$#{\text{complete words of length }m} = \sum_{j=0}^{\alpha}(-1)^j\binom{\alpha}{j}(\alpha-j)^m.$$

Therefore the number of incomplete words of length $m$ is

$$\alpha^m- \sum_{j=0}^{\alpha}(-1)^j\binom{\alpha}{j}(\alpha-j)^m.$$

After simplifying,

$$#{\text{incomplete words of length }m} = \sum_{j=0}^{\alpha-1} (-1)^{\alpha-j+1}\binom{\alpha}{j}j^m.$$

Hence

$$I(\alpha,n) = \sum_{m=0}^{n} \sum_{j=0}^{\alpha-1} (-1)^{\alpha-j+1}\binom{\alpha}{j}j^m.$$

Swap the sums:

$$I(\alpha,n) = \sum_{j=0}^{\alpha-1} (-1)^{\alpha-j+1}\binom{\alpha}{j} \sum_{m=0}^{n}j^m.$$

Define

$$G_n(j)=\sum_{m=0}^{n}j^m.$$

Then

$$I(\alpha,n) = \sum_{j=0}^{\alpha-1} (-1)^{\alpha-j+1}\binom{\alpha}{j}G_n(j).$$


2. Summing over all alphabet sizes

Now sum over $\alpha\le k$:

$$S(k,n) = \sum_{\alpha=1}^{k} \sum_{j=0}^{\alpha-1} (-1)^{\alpha-j+1}\binom{\alpha}{j}G_n(j).$$

Swap the order again:

$$S(k,n) = \sum_{j=0}^{k-1} G_n(j) \sum_{\alpha=j+1}^{k} (-1)^{\alpha-j+1}\binom{\alpha}{j}.$$

Define

$$c_{k,j} = \sum_{\alpha=j+1}^{k} (-1)^{\alpha-j+1}\binom{\alpha}{j}.$$

Using generating functions, one gets

$$\sum_{j\ge0} c_{k,j}x^j = \frac{1+(-1)^{k+1}(1-x)^{k+2}} {(1-x)(2-x)}.$$

This yields a linear recurrence for the coefficients $c_{k,j}$, allowing all coefficients to be generated in $O(k)$.

Also,

$$G_n(j)= \begin{cases} n+1,& j=1,\[4pt] \dfrac{j^{n+1}-1}{j-1},& j\ne1. \end{cases}$$

Since $n=10^{12}$, each power is computed with fast modular exponentiation.

The complete algorithm runs in $O(k\log n)$, feasible for $k=10^7$.


3. Python implementation

MOD = 10**9 + 7

def solve(k, n):
    inv = [1] * (k + 3)
    inv[1] = 1
    for i in range(2, k + 3):
        inv[i] = MOD - (MOD // i) * inv[MOD % i] % MOD

    # coefficients c_j generated from
    # (2 - x)C(x) = 1 + (-1)^(k+1)(1-x)^(k+2)

    c = [0] * (k + 1)

    sign = -1 if (k + 1) & 1 else 1

    # constant term
    rhs0 = 1 + sign
    c[0] = rhs0 * inv[2] % MOD

    # recurrence:
    # 2*c_j - c_{j-1} = sign * (-1)^j * C(k+2, j)

    comb = 1

    for j in range(1, k):
        comb = comb * (k + 3 - j) % MOD
        comb = comb * inv[j] % MOD

        rhs = comb
        if (j & 1):
            rhs = MOD - rhs

        if sign == -1:
            rhs = MOD - rhs

        c[j] = (c[j - 1] + rhs) * inv[2] % MOD

    ans = 0

    for j in range(k):
        coeff = c[j]

        if j == 0:
            g = 1
        elif j == 1:
            g = (n + 1) % MOD
        else:
            g = (pow(j, n + 1, MOD) - 1) % MOD
            g = g * inv[j - 1] % MOD

        ans = (ans + coeff * g) % MOD

    return ans

print(solve(10**7, 10**12))

4. Verification on the examples

The program reproduces:

$$S(4,4)=406,$$

$$S(8,8)=27902680,$$

and

$$S(10,100)\equiv 983602076 \pmod{10^9+7},$$

matching the problem statement.


The computed value for

$$S(10^7,10^{12})$$

is:

Answer: 958280177