Project Euler Problem 674

We define the mathcal{I} operator as the function and mathcal{I}-expressions as arithmetic expressions built only from v

Project Euler Problem 674

Solution

Answer: 416678753

Let

$$\mathcal I(x,y)=(1+x+y)^2+y-x.$$

The crucial observation is that $\mathcal I$ is an injective encoding of ordered pairs of non-negative integers.

Indeed, write

$$s=1+x+y.$$

Then

$$\mathcal I(x,y)=s^2+(y-x).$$

Since $0\le y-x\le s-1$ in absolute value, every value lies in a unique interval around the square $s^2$. Therefore from $n=\mathcal I(x,y)$ we can uniquely recover:

  • $s=\lfloor \sqrt n\rfloor$,
  • $d=n-s^2=y-x$,
  • and hence

$$x=\frac{s-1-d}{2},\qquad y=\frac{s-1+d}{2}.$$

So:

$$\mathcal I(a,b)=\mathcal I(c,d)\iff a=c\text{ and }b=d.$$

That means $\mathcal I$ behaves exactly like a constructor symbol in first-order term unification.


1. Converting the problem into unification

Every expression is either:

  • a variable, or
  • $\mathcal I(A,B)$.

Therefore solving

$$e_1=e_2$$

reduces recursively to:

$$\mathcal I(A,B)=\mathcal I(C,D) \iff A=C \text{ and } B=D.$$

This is standard Robinson unification.

Examples:

  • $A=\mathcal I(x,\mathcal I(z,t))$
  • $B=\mathcal I(\mathcal I(y,z),y)$

Matching arguments gives:

$$x=\mathcal I(y,z), \qquad \mathcal I(z,t)=y.$$

Substituting:

$$x=\mathcal I(\mathcal I(z,t),z).$$

Setting free variables minimally ($z=t=0$) gives:

$$y=\mathcal I(0,0)=1,$$

$$x=\mathcal I(1,0)=3,$$

and thus

$$A=B=\mathcal I(3,1)=23.$$

Exactly as stated.


2. Why the minimum occurs when all free variables are $0$

For nonnegative integers,

$$\mathcal I(x+1,y)-\mathcal I(x,y)=2(1+x+y)>0,$$

$$\mathcal I(x,y+1)-\mathcal I(x,y)=2(2+x+y)>0.$$

So $\mathcal I$ is strictly increasing in each argument.

After unification, any variables not forced by constraints are free nonnegative integers. Since the expression is monotone in every variable, the least simultaneous value is obtained by setting every remaining free variable to $0$.

Thus the algorithm is:

  1. Parse expressions into trees.
  2. Attempt unification.
  3. If impossible → contribution $0$.
  4. Otherwise evaluate the unified expression with all remaining variables $0$.

Only the last $9$ digits are required, so computations are done modulo $10^9$.


3. Python implementation

from __future__ import annotations
import sys

MOD = 10**9

# -------------------------------------------------
# Term representation
# -------------------------------------------------

class Var:
    __slots__ = ("name",)

    def __init__(self, name: str):
        self.name = sys.intern(name)

class INode:
    __slots__ = ("a", "b")

    def __init__(self, a, b):
        self.a = a
        self.b = b

# -------------------------------------------------
# Parsing
# -------------------------------------------------

def parse_expr(s: str):
    i = 0
    n = len(s)

    def parse():
        nonlocal i

        if s[i] == "I":
            i += 2  # skip "I("

            left = parse()

            i += 1  # skip ','

            right = parse()

            i += 1  # skip ')'

            return INode(left, right)

        start = i
        while i < n and s[i].isalpha():
            i += 1

        return Var(s[start:i])

    return parse()

# -------------------------------------------------
# Unification
# -------------------------------------------------

def unify(t1, t2):
    subs = {}
    occurs_cache = {}

    def deref(t):
        while isinstance(t, Var) and t.name in subs:
            nxt = subs[t.name]

            # path compression
            if isinstance(nxt, Var) and nxt.name in subs:
                subs[t.name] = subs[nxt.name]

            t = nxt

        return t

    def occurs(vname, t):
        t = deref(t)

        key = (vname, id(t))
        if key in occurs_cache:
            return occurs_cache[key]

        stack = [t]
        seen = set()

        while stack:
            cur = deref(stack.pop())

            if isinstance(cur, Var):
                if cur.name == vname:
                    occurs_cache[key] = True
                    return True
            else:
                nid = id(cur)

                if nid in seen:
                    continue

                seen.add(nid)

                stack.append(cur.a)
                stack.append(cur.b)

        occurs_cache[key] = False
        return False

    stack = [(t1, t2)]

    while stack:
        a, b = stack.pop()

        a = deref(a)
        b = deref(b)

        if a is b:
            continue

        if isinstance(a, Var) and isinstance(b, Var):
            if a.name == b.name:
                continue

        if isinstance(a, Var):
            if occurs(a.name, b):
                return None

            subs[a.name] = b
            continue

        if isinstance(b, Var):
            if occurs(b.name, a):
                return None

            subs[b.name] = a
            continue

        if isinstance(a, INode) and isinstance(b, INode):
            stack.append((a.a, b.a))
            stack.append((a.b, b.b))
            continue

        return None

    return subs

# -------------------------------------------------
# Evaluation
# -------------------------------------------------

def i_op(x, y):
    s = (1 + x + y) % MOD
    return (s * s + y - x) % MOD

def evaluate(root, subs):
    memo = {}

    def deref(t):
        while isinstance(t, Var) and t.name in subs:
            t = subs[t.name]
        return t

    def rec(t):
        t = deref(t)

        if isinstance(t, Var):
            return 0

        tid = id(t)

        if tid in memo:
            return memo[tid]

        x = rec(t.a)
        y = rec(t.b)

        val = i_op(x, y)

        memo[tid] = val
        return val

    return rec(root)

# -------------------------------------------------
# Solve
# -------------------------------------------------

def least_simultaneous_value(e1, e2):
    subs = unify(e1, e2)

    if subs is None:
        return 0

    return evaluate(e1, subs)

def main():
    with open("I-expressions.txt") as f:
        terms = [parse_expr(line.strip())
                 for line in f if line.strip()]

    total = 0

    n = len(terms)

    for i in range(n):
        for j in range(i + 1, n):
            total += least_simultaneous_value(
                terms[i], terms[j]
            )

    print(total % MOD)

if __name__ == "__main__":
    main()

4. Code walkthrough

Parsing

parse_expr converts strings like

$$I(x,I(z,t))$$

into a binary tree:

  • Var(name)
  • INode(left,right)

Unification

unify(t1,t2) implements Robinson unification:

  • variable vs variable,
  • variable vs compound term,
  • compound vs compound.

The occurs check prevents cyclic substitutions such as:

$$x = I(x,y).$$

If unification fails, there is no nonnegative solution.


Evaluation

After unification, all unconstrained variables are set to $0$, which gives the least possible value because $\mathcal I$ is monotone increasing.


5. Verification on the sample

For

$$A = I(x,I(z,t)),$$

$$B = I(I(y,z),y),$$

$$C = I(I(x,z),y),$$

the program obtains:

  • $LSV(A,B)=23$,
  • $LSV(A,C)=0$,
  • $LSV(B,C)=3$,

so the total is

$$23+0+3=26,$$

matching the statement.


6. Final computation

Running the program on the supplied file gives:

$$416678753$$

as the last nine digits.

Answer: 416678753