TAOCP 2.3.2 Exercise 19

Let $R(X,Y)$ denote the statement X \succeq Y.

Section 2.3.2: Binary Tree Representation of Trees

Exercise 19. [M35] A free lattice is a mathematical system, which (for the purposes of this exercise) can be simply defined as the set of all formulas composed of variables and two abstract binary operators "" and "". A relation "X ⪰ Y" is defined between certain formulas X and Y in the free lattice by the following rules:

i) X ∨ Y ⪰ W ∧ Z if and only if X ∨ Y ⪰ W or X ∨ Y ⪰ Z or X ⪰ W ∧ Z or Y ⪰ W ∧ Z;

ii) X ∧ Y ⪰ Z if and only if X ⪰ Z and Y ⪰ Z;

iii) X ⪰ Y ∨ Z if and only if X ⪰ Y and X ⪰ Z;

iv) x ⪰ Y ∧ Z if and only if x ⪰ Y or x ⪰ Z, when x is a variable;

v) X ∨ Y ⪰ z if and only if X ⪰ z or Y ⪰ z, when z is a variable;

vi) x ⪰ y if and only if x = y, when x and y are variables.

For example, we find a ∧ (b ∨ c) ⪰ (a ∧ b) ∨ (a ∧ c) ⪰̸ a ∧ (b ∨ c).

Design an algorithm that tests whether or not X ⪰ Y, given two formulas X and Y in the free lattice.

Verified: yes
Solve time: 1m33s


Setup

Let $R(X,Y)$ denote the statement

$$ X \succeq Y. $$

The formulas are represented as trees of the type discussed in Section 2.3.2. Every internal node is labeled either $\lor$ or $\land$, and every leaf is a variable.

Given two formulas $X$ and $Y$, the problem is to determine whether $R(X,Y)$ is true.

Let $|F|$ denote the number of nodes of a formula $F$. Define

$$ \mu(X,Y)=|X|+|Y|. $$

The recursive clauses (i)-(vi) give a complete characterization of the relation $\succeq$. The task is therefore to convert those clauses into a terminating algorithm and prove its correctness.

Solution

For every pair of subformulas $(A,B)$ occurring in $X$ and $Y$, define a Boolean value

$$ T(A,B). $$

The value is determined by the following recursion.

If $A$ and $B$ are variables, then

$$ T(A,B)= \begin{cases} \text{true},&A=B,\ \text{false},&A\ne B, \end{cases} $$

by rule (vi).

If $A$ is a variable and

$$ B=B_1\land B_2, $$

then

$$ T(A,B)=T(A,B_1)\vee T(A,B_2), $$

by rule (iv).

If

$$ A=A_1\lor A_2 $$

and $B$ is a variable, then

$$ T(A,B)=T(A_1,B)\vee T(A_2,B), $$

by rule (v).

If

$$ A=A_1\land A_2, $$

then

$$ T(A,B)=T(A_1,B)\wedge T(A_2,B), $$

by rule (ii).

If

$$ B=B_1\lor B_2, $$

then

$$ T(A,B)=T(A,B_1)\wedge T(A,B_2), $$

by rule (iii).

The remaining possibility is

$$ A=A_1\lor A_2,\qquad B=B_1\land B_2. $$

Then

$$ T(A,B)= T(A,B_1)\vee T(A,B_2)\vee T(A_1,B)\vee T(A_2,B), $$

by rule (i).

These clauses form the algorithm.

To avoid repeated computation, store each value $T(A,B)$ when it is first determined. Whenever the same pair is encountered again, use the stored value.

Termination

Every recursive call replaces $(A,B)$ by a pair $(A',B')$ for which at least one of the formulas is a proper subformula of its predecessor. Hence

$$ \mu(A',B')<\mu(A,B). $$

Since $\mu$ is a positive integer, recursion must terminate.

Storage order

Let

$$ A_1,\ldots,A_m $$

be the distinct subformulas of $X$, ordered so that every proper subformula precedes the formula containing it. Let

$$ B_1,\ldots,B_n $$

be the corresponding ordering for the subformulas of $Y$.

Construct an $m\times n$ table whose entry $(i,j)$ is $T(A_i,B_j)$. Since every recurrence refers only to pairs involving proper subformulas, all required entries have already been computed when $(i,j)$ is evaluated.

The answer to the problem is the value stored in the entry corresponding to $(X,Y)$.

Verification

The proof of correctness is by induction on $\mu(A,B)$.

For the basis, suppose $\mu(A,B)=2$. Then $A$ and $B$ are variables. The algorithm returns true exactly when $A=B$, which is rule (vi).

Assume correctness for all pairs with measure less than $k$, and consider a pair $(A,B)$ with $\mu(A,B)=k$.

If $A$ and $B$ match one of the forms covered by rules (iv), (v), (ii), (iii), or (i), the algorithm applies exactly the corresponding defining equivalence. Every recursive call involves a strictly smaller measure, hence by the induction hypothesis each subvalue computed by the algorithm is correct. Therefore the Boolean combination used by the algorithm has the same truth value as the defining rule for $A\succeq B$.

Thus the algorithm returns true exactly when $A\succeq B$.

By induction, the statement holds for every pair of formulas. Therefore the entry computed for $(X,Y)$ is true if and only if

$$ X\succeq Y. $$

This completes the proof.

Notes

If $m$ and $n$ denote the numbers of distinct subformulas of $X$ and $Y$, respectively, the memoized algorithm evaluates at most $mn$ table entries. Each entry is computed once, using only a constant number of previously computed entries. Hence the total running time is $O(mn)$ and the storage requirement is $O(mn)$.