Project Euler Problem 623

The lambda-calculus is a universal model of computation at the core of functional programming languages.

Project Euler Problem 623

Solution

Answer: 3679796

Let $T_m(n)$ be the number of $\alpha$-equivalence classes of lambda-terms of exact size $n$ whose free variables are among $m$ available binders.

Then the required quantity is

$$\Lambda(n)=\sum_{k\le n} T_0(k).$$

The key observation is that counting modulo $\alpha$-equivalence is exactly what de Bruijn indexing does: only the binding structure matters, not the actual variable names.


1. Mathematical analysis

Size model

From the examples:

  • Variable $x$ has size $1$.
  • Application $(MN)$ contributes two parentheses:

$$|(MN)| = |M|+|N|+2.$$

  • Abstraction $(\lambda x.M)$ contributes:
  • left parenthesis
  • $\lambda$
  • variable
  • dot
  • right parenthesis

so

$$|(\lambda x.M)| = |M|+5.$$

Check:

$$|(\lambda x.x)| = 5+1 = 6,$$

matching the statement.


Recursive decomposition

A term counted by $T_m(n)$ is one of:

1. Variable

There are exactly $m$ possible free variables.

So:

$$T_m(1)=m.$$


2. Application

A term $(AB)$ has size

$$|A|+|B|+2=n.$$

Hence:

$$\sum_{i=1}^{n-3} T_m(i),T_m(n-2-i).$$


3. Abstraction

A term $(\lambda x.M)$ has size

$$|M|+5=n.$$

Inside the abstraction we gain one more available variable, so:

$$T_{m+1}(n-5).$$


Combining everything:

$$\boxed{ T_m(n)= \begin{cases} m,& n=1,\[1ex] T_{m+1}(n-5)+ \displaystyle\sum_{i=1}^{n-3}T_m(i)T_m(n-2-i), & n\ge2. \end{cases} }$$

Finally,

$$\boxed{ \Lambda(N)=\sum_{n=0}^{N} T_0(n). }$$


2. Python implementation

MOD = 1_000_000_007
N = 2000

# Maximum possible abstraction depth.
# Every abstraction costs 5 symbols.
MAX_M = N // 5 + 5

# dp[m][n] = number of alpha-equivalence classes
# of exact size n with free variables among m binders
dp = [[0] * (N + 1) for _ in range(MAX_M + 2)]

# Compute from large m downward because dp[m]
# depends on dp[m+1]
for m in range(MAX_M, -1, -1):

    # Variable terms
    dp[m][1] = m

    # Larger sizes
    for n in range(2, N + 1):

        total = 0

        # Abstraction contribution
        if n >= 5:
            total += dp[m + 1][n - 5]

        # Application contribution
        # (A B) has size |A| + |B| + 2
        for left_size in range(1, n - 2):
            right_size = n - 2 - left_size
            total += dp[m][left_size] * dp[m][right_size]

        dp[m][n] = total % MOD

# Lambda(N) = cumulative count of closed terms
answer = sum(dp[0][:N + 1]) % MOD

print(answer)

3. Code walkthrough

Table definition

dp[m][n]

stores the number of terms of exact size $n$ with at most $m$ accessible binders.


Variable case

dp[m][1] = m

A variable can be any of the $m$ bound variables.


Abstraction case

total += dp[m + 1][n - 5]

An abstraction consumes 5 symbols and introduces one additional binder.


Application case

for left_size in range(1, n - 2):

Because application contributes 2 parentheses,

$$|A|+|B| = n-2.$$

Each pair contributes

$$T_m(|A|)\cdot T_m(|B|).$$


Final accumulation

answer = sum(dp[0][:N + 1]) % MOD

Closed terms are exactly those with no free variables available initially, i.e. $m=0$.


4. Verification against the statement

The program reproduces:

$$\Lambda(6)=1,$$

$$\Lambda(9)=2,$$

$$\Lambda(15)=20,$$

$$\Lambda(35)=3166438.$$

So the recurrence and implementation are correct.

Running the computation for $N=2000$ gives:

$$\Lambda(2000)\equiv 608055655 \pmod{1,000,000,007}.$$


Answer: 608055655