Project Euler Problem 603
Let S(n) be the sum of all contiguous integer-substrings that can be formed from the integer n.
Solution
Answer: 879476477
Let the decimal digits of a number/string be $d_1,d_2,\dots,d_N$.
We must compute
$$S(n)=\sum_{\text{all contiguous substrings}} \text{(substring value)}.$$
For the problem:
- $P(n)$ = concatenation of the first $n$ primes,
- $C(n,k)$ = $P(n)$ repeated $k$ times,
- required:
$$S(C(10^6,10^{12})) \pmod{10^9+7}.$$
1. Mathematical analysis
1.1 Contribution of one digit
Fix a digit $d_i$ at position $i$ (1-indexed from the left).
Suppose a substring starts at some position $\le i$ and ends at $j\ge i$.
Inside that substring, $d_i$ contributes
$$d_i \cdot 10^{j-i}.$$
How many choices of starting point are there?
Exactly $i$ choices.
So the total contribution of $d_i$ is
$$d_i \cdot i \sum_{j=i}^{N} 10^{j-i}.$$
Let $t=j-i$. Then
$$\sum_{t=0}^{N-i}10^t = \frac{10^{N-i+1}-1}{9}.$$
Therefore
$$S = \sum_{i=1}^{N} d_i, i ,\frac{10^{N-i+1}-1}{9}.$$
So
$$S = \frac1{9} \left( 10^{N+1}\sum_{i=1}^{N} d_i i 10^{-i} - \sum_{i=1}^{N} d_i i \right).$$
Everything is now reduced to weighted digit sums.
1.2 Repeated blocks
Let the digit string of $P(n)$ have length $L$, with digits
$$a_1,a_2,\dots,a_L.$$
We repeat this block $k$ times.
A digit in copy $r$ (where $0\le r<k$) and internal position $i$ has global position
$$rL+i.$$
Define
$$q=10^{-1}\pmod M, \qquad M=10^9+7.$$
Then
$$10^{- (rL+i)} = q^{rL+i}.$$
We need
$$T = \sum_{r=0}^{k-1}\sum_{i=1}^{L} a_i(rL+i)q^{rL+i}.$$
Factor:
$$T = \sum_{r=0}^{k-1} q^{rL} \left( rL \sum_i a_i q^i + \sum_i a_i i q^i \right).$$
Define
$$B=\sum_i a_i q^i, \qquad C=\sum_i a_i i q^i.$$
Then
$$T = BL\sum_{r=0}^{k-1} r x^r + C\sum_{r=0}^{k-1} x^r,$$
where
$$x=q^L.$$
We also need
$$U=\sum d_i i = kE + DL\frac{k(k-1)}2,$$
where
$$D=\sum_i a_i, \qquad E=\sum_i a_i i.$$
Thus
$$S = \frac{ 10^{N+1}T-U }{9}, \qquad N=Lk.$$
1.3 Geometric sums
We use
$$G_0=\sum_{r=0}^{k-1}x^r = \frac{1-x^k}{1-x},$$
and
$$G_1=\sum_{r=0}^{k-1} r x^r = \frac{x(1-kx^{k-1}+(k-1)x^k)}{(1-x)^2}.$$
So
$$T = BLG_1 + CG_0.$$
Everything can now be computed modulo $M$.
2. Python implementation
MOD = 10**9 + 7
INV9 = pow(9, MOD - 2, MOD)
INV10 = pow(10, MOD - 2, MOD)
INV2 = pow(2, MOD - 2, MOD)
N_PRIMES = 10**6
K = 10**12
# ------------------------------------------------------------
# Sieve for first 1,000,000 primes
# ------------------------------------------------------------
import math
# Standard upper bound for nth prime
LIMIT = int(
N_PRIMES * (
math.log(N_PRIMES) +
math.log(math.log(N_PRIMES))
)
) + 100
sieve = bytearray(b"\x01") * (LIMIT + 1)
sieve[0] = sieve[1] = 0
for p in range(2, int(math.isqrt(LIMIT)) + 1):
if sieve[p]:
start = p * p
sieve[start:LIMIT + 1:p] = b"\x00" * (
((LIMIT - start) // p) + 1
)
# ------------------------------------------------------------
# Build digit statistics for P(10^6)
# ------------------------------------------------------------
L = 0
B = 0
C = 0
D = 0
E = 0
qpow = 1
count = 0
for p in range(2, LIMIT + 1):
if sieve[p]:
count += 1
for ch in str(p):
d = ord(ch) - 48
L += 1
# q^L
qpow = (qpow * INV10) % MOD
B = (B + d * qpow) % MOD
C = (C + d * L * qpow) % MOD
D = (D + d) % MOD
E = (E + d * L) % MOD
if count == N_PRIMES:
break
# ------------------------------------------------------------
# Geometric sums
# ------------------------------------------------------------
x = pow(INV10, L, MOD)
den = (1 - x) % MOD
den_inv = pow(den, MOD - 2, MOD)
G0 = (1 - pow(x, K, MOD)) * den_inv
G0 %= MOD
G1 = (
x *
(1 - K * pow(x, K - 1, MOD) + (K - 1) * pow(x, K, MOD))
)
G1 %= MOD
G1 = (G1 * den_inv * den_inv) % MOD
# ------------------------------------------------------------
# Assemble formula
# ------------------------------------------------------------
T = (B * L % MOD) * G1
T += C * G0
T %= MOD
U = (
(K % MOD) * E
+ D * (L % MOD) % MOD
* ((K - 1) % MOD) % MOD
* (K % MOD) % MOD
* INV2
)
U %= MOD
# exponent reduced modulo MOD-1 by Fermat
N_mod = (L % (MOD - 1)) * (K % (MOD - 1))
N_mod %= (MOD - 1)
answer = (
(pow(10, N_mod + 1, MOD) * T - U)
* INV9
) % MOD
print(answer)
3. Code walkthrough
Sieve section
We generate the first million primes using a standard bytearray sieve.
The millionth prime is $15{,}485{,}863$, so the upper bound near $1.65\times10^7$ is sufficient.
Building the digit statistics
As we concatenate primes digit-by-digit, we maintain:
- $L$: total number of digits,
- $B=\sum a_i q^i$,
- $C=\sum a_i i q^i$,
- $D=\sum a_i$,
- $E=\sum a_i i$.
We never construct the gigantic number itself.
Geometric sums
Because $C(n,k)$ is a repeated block, all repeated-copy effects collapse into:
$$G_0=\sum x^r, \qquad G_1=\sum r x^r.$$
Both are computed in $O(\log k)$ time using modular exponentiation.
Final assembly
We compute
$$S=\frac{10^{N+1}T-U}{9}\pmod M.$$
Since $N=Lk$ is enormous, we reduce the exponent modulo $M-1$ using Fermat’s theorem.
4. Verification on small examples
For $2024$:
$$S(2024)=2304,$$
matching the statement.
For repeated block example:
$$C(7,3)=235711131723571113172357111317,$$
the same formulas apply and agree with brute force on small tests.
The final result is within $[0,M)$, as expected.
Answer: 879476477