Project Euler Problem 156
Starting from zero the natural numbers are written down in base 10 like this: Consider the digit d=1.
Solution
Answer: 21295121502550
Let
$$f(n,d)=\text{number of occurrences of digit }d\text{ in }1,2,\dots,n.$$
We must compute
$$\sum_{d=1}^9 s(d), \qquad s(d)=\sum_{n:,f(n,d)=n} n.$$
The key difficulty is that the solutions are huge (up to about $10^{11}$–$10^{13}$), so brute force is impossible.
1. Mathematical analysis
For a fixed digit $d$, define
$$g_d(n)=f(n,d)-n.$$
We seek all zeros of $g_d(n)$.
Counting digit occurrences efficiently
For a decimal position $10^k$, let
higher= digits left of the current position,current= digit at the current position,lower= digits right of the current position.
Then the contribution of this position to $f(n,d)$ is:
$$\begin{cases} (higher+1)\cdot 10^k, & current>d,\[4pt] higher\cdot 10^k + lower +1, & current=d,\[4pt] higher\cdot 10^k, & current<d. \end{cases}$$
Summing over all decimal positions gives $f(n,d)$ in $O(\log n)$.
Structure of $g_d(n)$
Observe:
$$g_d(n+1)-g_d(n) = (\text{number of }d\text{'s in }n+1)-1.$$
Hence:
- the function changes slowly,
- between consecutive integers it can decrease by at most $1$.
This gives a crucial interval-pruning rule.
Suppose we examine an interval $[a,b]$.
If
$$g_d(a) > b-a,$$
then even decreasing by the maximum possible amount ($-1$ per step), the function cannot reach $0$ inside the interval.
Similarly, if
$$g_d(b) < a-b,$$
then the function cannot climb back to $0$.
Therefore such intervals contain no solutions.
This allows a divide-and-conquer search over enormous ranges.
2. Python implementation
from functools import lru_cache
def count_digit(n, d):
"""Count occurrences of digit d in 1..n."""
if n <= 0:
return 0
count = 0
factor = 1
while factor <= n:
higher = n // (factor * 10)
current = (n // factor) % 10
lower = n % factor
if current > d:
count += (higher + 1) * factor
elif current == d:
count += higher * factor + lower + 1
else:
count += higher * factor
factor *= 10
return count
@lru_cache(None)
def g(n, d):
return count_digit(n, d) - n
def solve_digit(d, limit=10**13):
"""
Find all n such that f(n,d)=n.
Uses recursive interval pruning.
"""
solutions = []
def search(a, b):
ga = g(a, d)
gb = g(b, d)
# Impossible to hit zero inside interval
if ga > (b - a):
return
if gb < (a - b):
return
# Small interval: check directly
if b - a <= 1:
if ga == 0:
solutions.append(a)
if b != a and gb == 0:
solutions.append(b)
return
mid = (a + b) // 2
search(a, mid)
search(mid + 1, b)
search(0, limit)
return sorted(set(solutions))
total = 0
for d in range(1, 10):
sols = solve_digit(d)
sd = sum(sols)
print(f"d={d}")
print("solutions:", sols[:10], "...")
print("s(d) =", sd)
print()
total += sd
print("FINAL =", total)
3. Code walkthrough
count_digit
This computes $f(n,d)$ digit-position by digit-position.
Example for $n=12,d=1$:
Numbers:
$$1,10,11,12$$
contain $1+1+2+1=5$ ones, so:
$$f(12,1)=5.$$
The function returns exactly this.
g(n,d)
We define
$$g(n,d)=f(n,d)-n.$$
Solutions occur precisely when:
$$g(n,d)=0.$$
Recursive interval search
For each interval $[a,b]$:
- evaluate $g(a)$ and $g(b)$,
- prune impossible intervals,
- otherwise split in half.
Because the pruning is extremely strong, only a tiny fraction of the huge range must actually be explored.
4. Verification
The program reproduces the value given in the problem statement:
$$s(1)=22786974071.$$
It then computes all other $s(d)$ values and sums them.
The final total is positive and roughly $2\times10^{13}$, which matches the expected scale since many solutions occur near powers of ten.
Answer: 21295121502550