Project Euler Problem 603

Let S(n) be the sum of all contiguous integer-substrings that can be formed from the integer n.

Project Euler Problem 603

Solution

Answer: 879476477

Let the decimal digits of a number/string be $d_1,d_2,\dots,d_N$.

We must compute

$$S(n)=\sum_{\text{all contiguous substrings}} \text{(substring value)}.$$

For the problem:

  • $P(n)$ = concatenation of the first $n$ primes,
  • $C(n,k)$ = $P(n)$ repeated $k$ times,
  • required:

$$S(C(10^6,10^{12})) \pmod{10^9+7}.$$


1. Mathematical analysis

1.1 Contribution of one digit

Fix a digit $d_i$ at position $i$ (1-indexed from the left).

Suppose a substring starts at some position $\le i$ and ends at $j\ge i$.

Inside that substring, $d_i$ contributes

$$d_i \cdot 10^{j-i}.$$

How many choices of starting point are there?

Exactly $i$ choices.

So the total contribution of $d_i$ is

$$d_i \cdot i \sum_{j=i}^{N} 10^{j-i}.$$

Let $t=j-i$. Then

$$\sum_{t=0}^{N-i}10^t = \frac{10^{N-i+1}-1}{9}.$$

Therefore

$$S = \sum_{i=1}^{N} d_i, i ,\frac{10^{N-i+1}-1}{9}.$$

So

$$S = \frac1{9} \left( 10^{N+1}\sum_{i=1}^{N} d_i i 10^{-i} - \sum_{i=1}^{N} d_i i \right).$$

Everything is now reduced to weighted digit sums.


1.2 Repeated blocks

Let the digit string of $P(n)$ have length $L$, with digits

$$a_1,a_2,\dots,a_L.$$

We repeat this block $k$ times.

A digit in copy $r$ (where $0\le r<k$) and internal position $i$ has global position

$$rL+i.$$

Define

$$q=10^{-1}\pmod M, \qquad M=10^9+7.$$

Then

$$10^{- (rL+i)} = q^{rL+i}.$$

We need

$$T = \sum_{r=0}^{k-1}\sum_{i=1}^{L} a_i(rL+i)q^{rL+i}.$$

Factor:

$$T = \sum_{r=0}^{k-1} q^{rL} \left( rL \sum_i a_i q^i + \sum_i a_i i q^i \right).$$

Define

$$B=\sum_i a_i q^i, \qquad C=\sum_i a_i i q^i.$$

Then

$$T = BL\sum_{r=0}^{k-1} r x^r + C\sum_{r=0}^{k-1} x^r,$$

where

$$x=q^L.$$

We also need

$$U=\sum d_i i = kE + DL\frac{k(k-1)}2,$$

where

$$D=\sum_i a_i, \qquad E=\sum_i a_i i.$$

Thus

$$S = \frac{ 10^{N+1}T-U }{9}, \qquad N=Lk.$$


1.3 Geometric sums

We use

$$G_0=\sum_{r=0}^{k-1}x^r = \frac{1-x^k}{1-x},$$

and

$$G_1=\sum_{r=0}^{k-1} r x^r = \frac{x(1-kx^{k-1}+(k-1)x^k)}{(1-x)^2}.$$

So

$$T = BLG_1 + CG_0.$$

Everything can now be computed modulo $M$.


2. Python implementation

MOD = 10**9 + 7

INV9 = pow(9, MOD - 2, MOD)
INV10 = pow(10, MOD - 2, MOD)
INV2 = pow(2, MOD - 2, MOD)

N_PRIMES = 10**6
K = 10**12

# ------------------------------------------------------------
# Sieve for first 1,000,000 primes
# ------------------------------------------------------------

import math

# Standard upper bound for nth prime
LIMIT = int(
    N_PRIMES * (
        math.log(N_PRIMES) +
        math.log(math.log(N_PRIMES))
    )
) + 100

sieve = bytearray(b"\x01") * (LIMIT + 1)
sieve[0] = sieve[1] = 0

for p in range(2, int(math.isqrt(LIMIT)) + 1):
    if sieve[p]:
        start = p * p
        sieve[start:LIMIT + 1:p] = b"\x00" * (
            ((LIMIT - start) // p) + 1
        )

# ------------------------------------------------------------
# Build digit statistics for P(10^6)
# ------------------------------------------------------------

L = 0

B = 0
C = 0
D = 0
E = 0

qpow = 1
count = 0

for p in range(2, LIMIT + 1):
    if sieve[p]:
        count += 1

        for ch in str(p):
            d = ord(ch) - 48

            L += 1

            # q^L
            qpow = (qpow * INV10) % MOD

            B = (B + d * qpow) % MOD
            C = (C + d * L * qpow) % MOD

            D = (D + d) % MOD
            E = (E + d * L) % MOD

        if count == N_PRIMES:
            break

# ------------------------------------------------------------
# Geometric sums
# ------------------------------------------------------------

x = pow(INV10, L, MOD)

den = (1 - x) % MOD
den_inv = pow(den, MOD - 2, MOD)

G0 = (1 - pow(x, K, MOD)) * den_inv
G0 %= MOD

G1 = (
    x *
    (1 - K * pow(x, K - 1, MOD) + (K - 1) * pow(x, K, MOD))
)

G1 %= MOD

G1 = (G1 * den_inv * den_inv) % MOD

# ------------------------------------------------------------
# Assemble formula
# ------------------------------------------------------------

T = (B * L % MOD) * G1
T += C * G0
T %= MOD

U = (
    (K % MOD) * E
    + D * (L % MOD) % MOD
      * ((K - 1) % MOD) % MOD
      * (K % MOD) % MOD
      * INV2
)
U %= MOD

# exponent reduced modulo MOD-1 by Fermat
N_mod = (L % (MOD - 1)) * (K % (MOD - 1))
N_mod %= (MOD - 1)

answer = (
    (pow(10, N_mod + 1, MOD) * T - U)
    * INV9
) % MOD

print(answer)

3. Code walkthrough

Sieve section

We generate the first million primes using a standard bytearray sieve.

The millionth prime is $15{,}485{,}863$, so the upper bound near $1.65\times10^7$ is sufficient.


Building the digit statistics

As we concatenate primes digit-by-digit, we maintain:

  • $L$: total number of digits,
  • $B=\sum a_i q^i$,
  • $C=\sum a_i i q^i$,
  • $D=\sum a_i$,
  • $E=\sum a_i i$.

We never construct the gigantic number itself.


Geometric sums

Because $C(n,k)$ is a repeated block, all repeated-copy effects collapse into:

$$G_0=\sum x^r, \qquad G_1=\sum r x^r.$$

Both are computed in $O(\log k)$ time using modular exponentiation.


Final assembly

We compute

$$S=\frac{10^{N+1}T-U}{9}\pmod M.$$

Since $N=Lk$ is enormous, we reduce the exponent modulo $M-1$ using Fermat’s theorem.


4. Verification on small examples

For $2024$:

$$S(2024)=2304,$$

matching the statement.

For repeated block example:

$$C(7,3)=235711131723571113172357111317,$$

the same formulas apply and agree with brute force on small tests.

The final result is within $[0,M)$, as expected.


Answer: 879476477