Solution

Answer: 21295121502550

Let

$$f(n,d)=\text{number of occurrences of digit }d\text{ in }1,2,\dots,n.$$

We must compute

$$\sum_{d=1}^9 s(d), \qquad s(d)=\sum_{n:,f(n,d)=n} n.$$

The key difficulty is that the solutions are huge (up to about $10^{11}$–$10^{13}$), so brute force is impossible.

1. Mathematical analysis

For a fixed digit $d$, define

$$g_d(n)=f(n,d)-n.$$

We seek all zeros of $g_d(n)$.

Counting digit occurrences efficiently

For a decimal position $10^k$, let

higher = digits left of the current position,
current = digit at the current position,
lower = digits right of the current position.

Then the contribution of this position to $f(n,d)$ is:

$$\begin{cases} (higher+1)\cdot 10^k, & current>d,\[4pt] higher\cdot 10^k + lower +1, & current=d,\[4pt] higher\cdot 10^k, & current<d. \end{cases}$$

Summing over all decimal positions gives $f(n,d)$ in $O(\log n)$.

Structure of $g_d(n)$

Observe:

$$g_d(n+1)-g_d(n) = (\text{number of }d\text{'s in }n+1)-1.$$

Hence:

the function changes slowly,
between consecutive integers it can decrease by at most $1$.

This gives a crucial interval-pruning rule.

Suppose we examine an interval $[a,b]$.

$$g_d(a) > b-a,$$

then even decreasing by the maximum possible amount ($-1$ per step), the function cannot reach $0$ inside the interval.

Similarly, if

$$g_d(b) < a-b,$$

then the function cannot climb back to $0$.

Therefore such intervals contain no solutions.

This allows a divide-and-conquer search over enormous ranges.

2. Python implementation

from functools import lru_cache

def count_digit(n, d):
    """Count occurrences of digit d in 1..n."""
    if n <= 0:
        return 0

    count = 0
    factor = 1

    while factor <= n:
        higher = n // (factor * 10)
        current = (n // factor) % 10
        lower = n % factor

        if current > d:
            count += (higher + 1) * factor
        elif current == d:
            count += higher * factor + lower + 1
        else:
            count += higher * factor

        factor *= 10

    return count

@lru_cache(None)
def g(n, d):
    return count_digit(n, d) - n

def solve_digit(d, limit=10**13):
    """
    Find all n such that f(n,d)=n.
    Uses recursive interval pruning.
    """
    solutions = []

    def search(a, b):
        ga = g(a, d)
        gb = g(b, d)

        # Impossible to hit zero inside interval
        if ga > (b - a):
            return

        if gb < (a - b):
            return

        # Small interval: check directly
        if b - a <= 1:
            if ga == 0:
                solutions.append(a)

            if b != a and gb == 0:
                solutions.append(b)

            return

        mid = (a + b) // 2

        search(a, mid)
        search(mid + 1, b)

    search(0, limit)

    return sorted(set(solutions))

total = 0

for d in range(1, 10):
    sols = solve_digit(d)
    sd = sum(sols)

    print(f"d={d}")
    print("solutions:", sols[:10], "...")
    print("s(d) =", sd)
    print()

    total += sd

print("FINAL =", total)

3. Code walkthrough

`count_digit`

This computes $f(n,d)$ digit-position by digit-position.

Example for $n=12,d=1$:

Numbers:

$$1,10,11,12$$

contain $1+1+2+1=5$ ones, so:

$$f(12,1)=5.$$

The function returns exactly this.

`g(n,d)`

We define

$$g(n,d)=f(n,d)-n.$$

Solutions occur precisely when:

$$g(n,d)=0.$$

Recursive interval search

For each interval $[a,b]$:

evaluate $g(a)$ and $g(b)$,
prune impossible intervals,
otherwise split in half.

Because the pruning is extremely strong, only a tiny fraction of the huge range must actually be explored.

4. Verification

The program reproduces the value given in the problem statement:

$$s(1)=22786974071.$$

It then computes all other $s(d)$ values and sums them.

The final total is positive and roughly $2\times10^{13}$, which matches the expected scale since many solutions occur near powers of ten.

Answer: 21295121502550

Project Euler Problem 156