Project Euler Problem 361

The Thue-Morse sequence Tn is a binary sequence satisfying: - T0 = 0 - T{2n} = Tn - T{2n + 1} = 1 - Tn The first several

Project Euler Problem 361

Solution

Answer: 178476944

Let $L$ be the language of all finite binary words that occur as contiguous factors of the infinite Axel Thue–Marston Morse sequence.

The key structural fact is that every factor of the Thue–Morse word can be generated recursively from a shorter factor.

If $u=u_0u_1u_2\cdots$, define the morphic expansions

$$E(u)=u_0\bar u_0u_1\bar u_1u_2\bar u_2\cdots$$

and

$$O(u)=\bar u_0u_1\bar u_1u_2\bar u_2\cdots$$

(where $\bar 0=1,\ \bar1=0$).

Every factor beginning with $1$ is obtained uniquely by one of:

  • an $E$-expansion of a shorter factor beginning with $1$,
  • an $O$-expansion of a shorter factor beginning with $0$.

This gives a recurrence for the number $c(n)$ of valid length-$n$ factors beginning with $1$:

$$c(1)=1,\qquad c(2)=2,\qquad c(3)=3$$

and for $n\ge4$,

$$c(2m)=c(m)+c(m+1),$$

$$c(2m+1)=2c(m+1).$$

Define the prefix sums

$$S(n)=\sum_{k=1}^n c(k).$$

Then:

$$S(2m)=3S(m)+S(m+1)-4,$$

$$S(2m+1)=S(2m)+2c(m+1).$$

Since numbers are ordered first by bit-length and then lexicographically, we can:

  1. Find the bit-length of $A_n$ using binary search on $S(n)$.
  2. Recursively construct the corresponding factor.
  3. Evaluate the binary number modulo $10^9$ without materializing billions of bits.

The following Python program implements exactly this strategy.

from functools import lru_cache, cache

MOD = 10**9

# ------------------------------------------------------------
# Count valid words beginning with 1
# ------------------------------------------------------------

@lru_cache(None)
def count_start1(n):
    if n <= 3:
        return n

    h = n // 2

    if n % 2 == 0:
        return count_start1(h) + count_start1(h + 1)

    return 2 * count_start1(h + 1)

@lru_cache(None)
def prefix_count(n):
    if n == 0:
        return 0

    if n <= 3:
        return [0, 1, 3, 6][n]

    h = n // 2

    if n % 2 == 0:
        return (
            3 * prefix_count(h)
            + prefix_count(h + 1)
            - 4
        )

    return (
        prefix_count(2 * h)
        + 2 * count_start1(h + 1)
    )

def find_length(index):
    lo, hi = 1, 1

    while prefix_count(hi) < index:
        hi *= 2

    while lo < hi:
        mid = (lo + hi) // 2

        if prefix_count(mid) >= index:
            hi = mid
        else:
            lo = mid + 1

    return lo

# ------------------------------------------------------------
# Lazy recursive representation of factors
# ------------------------------------------------------------

class Node:
    __slots__ = ("kind", "child", "length", "value")

    def __init__(self, kind, child, length, value=None):
        self.kind = kind
        self.child = child
        self.length = length
        self.value = value

EMPTY = Node("lit", None, 0, "")

def lit(s):
    return Node("lit", None, len(s), s)

def comp(child):
    if child.length == 0:
        return child
    return Node("comp", child, child.length)

def even(child, length):
    return Node("even", child, length)

def odd(child, length):
    return Node("odd", child, length)

@cache
def first_bit(node):
    if node.kind == "lit":
        return int(node.value[0])

    if node.kind == "comp":
        return 1 - first_bit(node.child)

    if node.kind == "even":
        return first_bit(node.child)

    return 1 - first_bit(node.child)

@cache
def last_bit(node):
    if node.kind == "lit":
        return int(node.value[-1])

    if node.kind == "comp":
        return 1 - last_bit(node.child)

    if node.kind == "even":
        return (
            1 - last_bit(node.child)
            if node.length % 2 == 0
            else last_bit(node.child)
        )

    return (
        last_bit(node.child)
        if node.length % 2 == 0
        else 1 - last_bit(node.child)
    )

@cache
def drop_first(node):
    if node.length <= 1:
        return EMPTY

    if node.kind == "lit":
        return lit(node.value[1:])

    if node.kind == "comp":
        return comp(drop_first(node.child))

    if node.kind == "even":
        return odd(node.child, node.length - 1)

    return even(drop_first(node.child), node.length - 1)

@cache
def drop_last(node):
    if node.length <= 1:
        return EMPTY

    if node.kind == "lit":
        return lit(node.value[:-1])

    if node.kind == "comp":
        return comp(drop_last(node.child))

    if node.kind == "even":
        if node.length % 2 == 0:
            return even(node.child, node.length - 1)

        return even(drop_last(node.child), node.length - 1)

    if node.length % 2 == 0:
        return odd(drop_last(node.child), node.length - 1)

    return odd(node.child, node.length - 1)

# ------------------------------------------------------------
# Modular evaluation
# ------------------------------------------------------------

def max_depth(length):
    d = 0

    while length > 3:
        length = length // 2 + 1
        d += 1

    return d

KMAX = (
    max(
        max_depth(find_length(10**k))
        for k in range(1, 19)
    )
    + 2
)

XP = [
    pow(2, 2**k, MOD)
    for k in range(KMAX + 3)
]

@cache
def geom(base, cnt):
    if cnt == 0:
        return 0

    if cnt == 1:
        return 1

    if cnt % 2 == 0:
        h = geom(base, cnt // 2)

        return (
            h
            + pow(base, cnt // 2, MOD) * h
        ) % MOD

    return (
        geom(base, cnt - 1)
        + pow(base, cnt - 1, MOD)
    ) % MOD

@cache
def eval_node(node):
    if node.length == 0:
        return tuple([0] * (KMAX + 3))

    arr = []

    if node.kind == "lit":

        for k in range(KMAX + 3):
            v = 0

            for ch in node.value:
                v = (
                    v * XP[k]
                    + (ch == "1")
                ) % MOD

            arr.append(v)

    elif node.kind == "comp":

        child = eval_node(node.child)

        for k in range(KMAX + 3):
            arr.append(
                (geom(XP[k], node.length) - child[k])
                % MOD
            )

    elif node.kind == "even":

        if node.length % 2 == 0:

            child = eval_node(node.child)

            for k in range(KMAX + 3):

                if k + 1 >= KMAX + 3:
                    arr.append(0)
                else:
                    arr.append(
                        (
                            (XP[k] - 1) * child[k + 1]
                            + geom(
                                (XP[k] * XP[k]) % MOD,
                                node.length // 2,
                            )
                        )
                        % MOD
                    )

        else:

            pref = mu_eval(drop_last(node.child))
            last = last_bit(node.child)

            for k in range(KMAX + 3):
                arr.append(
                    (pref[k] * XP[k] + last)
                    % MOD
                )

    else:

        h = node.length // 2
        first = first_bit(node.child)
        last = last_bit(node.child)

        if node.length % 2 == 1:

            tail = mu_eval(drop_first(node.child))

            for k in range(KMAX + 3):
                arr.append(
                    (
                        (1 - first)
                        * pow(XP[k], 2 * h, MOD)
                        + tail[k]
                    )
                    % MOD
                )

        else:

            mid = mu_eval(
                drop_last(drop_first(node.child))
            )

            for k in range(KMAX + 3):
                arr.append(
                    (
                        (1 - first)
                        * pow(XP[k], 2 * h - 1, MOD)
                        + mid[k] * XP[k]
                        + last
                    )
                    % MOD
                )

    return tuple(arr)

@cache
def mu_eval(node):
    base = eval_node(node)

    arr = []

    for k in range(KMAX + 3):

        if k + 1 >= KMAX + 3:
            arr.append(0)
        else:
            arr.append(
                (
                    (XP[k] - 1) * base[k + 1]
                    + geom(
                        (XP[k] * XP[k]) % MOD,
                        node.length,
                    )
                )
                % MOD
            )

    return tuple(arr)

PRE = {
    1: ["1"],
    2: ["10", "11"],
    3: ["100", "101", "110"],
}

def kth_word(length, rank, start_bit):

    if start_bit == 0:
        total = count_start1(length)

        return comp(
            kth_word(
                length,
                total - 1 - rank,
                1,
            )
        )

    if length <= 3:
        return lit(PRE[length][rank])

    h_even = (length + 1) // 2
    cnt_even = count_start1(h_even)

    if rank < cnt_even:
        return even(
            kth_word(h_even, rank, 1),
            length,
        )

    return odd(
        kth_word(
            length // 2 + 1,
            rank - cnt_even,
            0,
        ),
        length,
    )

def A_mod(index):

    length = find_length(index)

    rank = (
        index
        - 1
        - prefix_count(length - 1)
    )

    node = kth_word(length, rank, 1)

    return eval_node(node)[0]

# ------------------------------------------------------------
# Verification
# ------------------------------------------------------------

assert A_mod(100) == 3251
assert A_mod(1000) == 80852364498 % MOD

# ------------------------------------------------------------
# Final computation
# ------------------------------------------------------------

ans = 0

for k in range(1, 19):
    ans = (ans + A_mod(10**k)) % MOD

print(ans)

Code walkthrough

  • count_start1(n) computes how many valid factors of length n begin with 1.
  • prefix_count(n) accumulates these counts, allowing binary search for the correct bit-length.
  • kth_word() recursively reconstructs the required factor using the Thue–Morse morphic structure.
  • The Node system stores gigantic binary words lazily instead of explicitly.
  • eval_node() evaluates the represented binary number modulo $10^9$ directly from the recursive structure.
  • The final loop computes

$$\sum_{k=1}^{18} A_{10^k} \pmod{10^9}.$$

The verification checks:

$$A_{100}=3251, \qquad A_{1000}=80852364498,$$

matching the statement.

The resulting value is already reduced modulo $10^9$, exactly as requested.

Answer: 178476944