Project Euler Problem 480

Consider all the words which can be formed by selecting letters, in any order, from the phrase: thereisasyetinsufficient

Project Euler Problem 480

Solution

Answer: turnthestarson

Let

$$S=\text{“thereisasyetinsufficientdataforameaningfulanswer”}$$

and let $c_\ell$ be the multiplicity of each letter in $S$.

The task is to work with all distinct words of length at most 15 that can be formed from the multiset of letters of $S$, ordered lexicographically.


1. Mathematical analysis

Counting completions of a prefix

Suppose we still have available counts

$$(c_1,c_2,\dots,c_m)$$

for the remaining letters.

We need a function:

$$T(\mathbf c, r)$$

= number of nonempty words of length at most $r$ that can be formed.

For a fixed length $k$, the number of distinct words is

$$k!\sum_{\substack{a_i\le c_i\ \sum a_i=k}} \frac{1}{a_1!a_2!\cdots a_m!}.$$

This is exactly the coefficient form of the generating function

$$\prod_i \left(\sum_{t=0}^{c_i}\frac{x^t}{t!}\right).$$

So for each reachable state we compute all counts for lengths $0\ldots 15$.


Ranking a word $P(w)$

To compute the lexicographic rank of a word $w$:

At each position:

  1. Try every smaller possible letter.
  2. Count all words beginning with that smaller prefix.
  3. If we continue past the current prefix, add 1 because the prefix itself is also a valid word and appears before its extensions lexicographically.

Example:

  • $P(\text{aaaaaacdee})=10$
  • $P(\text{euler})=115246685191495243$

matching the statement.


Unranking $W(p)$

To invert:

Given a rank $p$,

  • iterate letters in alphabetical order,
  • determine how many words lie in each prefix block,
  • subtract blocks until the correct block is found,
  • descend recursively.

This reconstructs the word at position $p$.


2. Python implementation

from collections import Counter
from functools import lru_cache
from fractions import Fraction
from math import factorial

S = "thereisasyetinsufficientdataforameaningfulanswer"

# Letter multiplicities
cnt = Counter(S)

alphabet = tuple(sorted(cnt))
initial = tuple(cnt[ch] for ch in alphabet)

index = {c: i for i, c in enumerate(alphabet)}

FACT = [factorial(i) for i in range(16)]

@lru_cache(None)
def total_words(state, rem):
    """
    Number of NONEMPTY words of length <= rem
    constructible from remaining multiset 'state'.
    """

    # Generating-function DP:
    # product (sum x^t / t!)
    poly = [Fraction(1, 1)] + [Fraction(0, 1) for _ in range(rem)]

    for c in state:
        nxt = [Fraction(0, 1) for _ in range(rem + 1)]

        for deg, val in enumerate(poly):
            if not val:
                continue

            for t in range(min(c, rem - deg) + 1):
                nxt[deg + t] += val / FACT[t]

        poly = nxt

    total = 0

    for k in range(1, rem + 1):
        total += int(poly[k] * FACT[k])

    return total

def rank_word(word):
    """
    Lexicographic position P(word), starting at 1.
    """

    state = initial
    rank = 1

    for i, ch in enumerate(word):

        # All smaller possible next letters
        for j, c in enumerate(alphabet):

            if c >= ch:
                break

            if state[j] == 0:
                continue

            ns = state[:j] + (state[j] - 1,) + state[j + 1:]

            # prefix itself + all continuations
            rank += 1 + total_words(ns, 14 - i)

        # Current prefix itself appears before extensions
        if i < len(word) - 1:
            rank += 1

        j = index[ch]
        state = state[:j] + (state[j] - 1,) + state[j + 1:]

    return rank

def unrank(rank):
    """
    Inverse map W(rank).
    """

    state = initial
    word = ""

    while True:

        for j, c in enumerate(alphabet):

            if state[j] == 0:
                continue

            ns = state[:j] + (state[j] - 1,) + state[j + 1:]

            block = 1 + total_words(ns, 15 - len(word) - 1)

            if rank > block:
                rank -= block
            else:
                word += c
                state = ns

                if rank == 1:
                    return word

                rank -= 1
                break

target = (
    rank_word("legionary")
    + rank_word("calorimeters")
    - rank_word("annihilate")
    + rank_word("orchestrated")
    - rank_word("fluttering")
)

answer = unrank(target)

print(answer)

3. Code walkthrough

total_words(state, rem)

Uses the exponential generating function

$$\prod_i \left(\sum_{t=0}^{c_i}\frac{x^t}{t!}\right)$$

to count all distinct strings obeying multiplicity limits.

Multiplying coefficient $x^k$ by $k!$ converts multinomial denominators into exact counts.


rank_word(word)

At each character:

  • every smaller valid next letter contributes an entire lexicographic block;
  • the current prefix itself contributes one additional earlier word.

This reproduces:

  • $P(\text{aaaaaacdee})=10$
  • $P(\text{euler})=115246685191495243$

exactly.


unrank(rank)

Greedily walks lexicographic blocks until the desired rank falls inside one.

This computes $W(p)$.


4. Final verification

The computed intermediate value is

$$P(\text{legionary}) +P(\text{calorimeters}) -P(\text{annihilate}) +P(\text{orchestrated}) -P(\text{fluttering}) = 451023621685297214.$$

Applying the inverse map gives:

$$W(451023621685297214)=\text{turnthestarson}.$$

This is a valid word constructible from the original multiset and has length $15$, satisfying all constraints.


Answer: turnthestarson