TAOCP 2.3.3 Exercise 11

Section 2.3.3: Other Representations of Trees

Exercise 11. [24] (Equivalence declarations.) Several compiler languages, notably FORTRAN, provide a facility for overlapping the memory locations assigned to sequentially stored tables. The programmer gives the compiler a set of relations of the form X[j] ≡ Y[k], which means that variable X[j+s] is to be assigned to the same location as variable Y[k+s] for all s. Each variable is also given a range of allowable subscripts: "ARRAY X[l:u]" means that space is to be set aside in memory for the table entries X[l], X[l+1], ..., X[u]. For each equivalence class of variables, the compiler reserves as small a block of consecutive memory locations as possible, to contain all the table entries for the allowable subscript values of these variables.

For example, suppose we have ARRAY X[0:10], ARRAY Y[3:10], ARRAY A[1:1], and ARRAY Z[-2:0], plus the equivalences X[7] ≡ Y[3], Z[0] ≡ A[0], and Y[1] ≡ A[8]. We must set aside 20 consecutive locations

Memory layout for the equivalence-declaration example.

for these variables. (The location following A[1] is not an allowable subscript value for any of the arrays, but it must be reserved anyway.)

The object of this exercise is to modify Algorithm E so that it applies to the more general situation just described. Assume that we are writing a compiler for such a language, and the tables inside our compiler program itself have one node for each array, containing the fields NAME, PARENT, DELTA, LBD, and UBD. Assume that the compiler program has previously processed all the ARRAY declarations, so that if

ARRAY X[l:u] has appeared and if P points to the node for X, then

$$ \operatorname{NAME}(P) = \text{"X"},\quad \operatorname{PARENT}(P) = \Lambda,\quad \operatorname{DELTA}(P) = 0, $$

$$ \operatorname{LBD}(P) = l,\quad \operatorname{UBD}(P) = u. $$

The problem is to design an algorithm that processes the equivalence declarations, so that, after this algorithm has been performed,

$$ \operatorname{PARENT}(P) = \Lambda $$

means that locations X[LBD(P)], ..., X[UBD(P)] are to be reserved in memory for this equivalence class;

$$ \operatorname{PARENT}(P) = Q \ne \Lambda $$

means that location X[k] equals location Y[k + DELTA(P)], where NAME(Q) = "Y".

For example, before the equivalences listed above we might have the nodes

`P`	`NAME(P)`	`PARENT(P)`	`DELTA(P)`	`LBD(P)`	`UBD(P)`
`α`	`X`	`Λ`	`0`	`0`	`10`
`β`	`Y`	`Λ`	`0`	`3`	`10`
`γ`	`A`	`Λ`	`0`	`1`	`1`
`δ`	`Z`	`Λ`	`0`	`-2`	`0`

After the equivalences are processed, the nodes might appear thus:

`P`	`NAME(P)`	`PARENT(P)`	`DELTA(P)`	`LBD(P)`	`UBD(P)`
`α`	`X`	`Λ`	`*`	`-5`	`14`
`β`	`Y`	`α`	`4`	`*`	`*`
`γ`	`A`	`δ`	`0`	`*`	`*`
`δ`	`Z`	`α`	`-3`	`*`	`*`

("*" denotes irrelevant information.)

Design an algorithm that makes this transformation. Assume that inputs to your algorithm have the form (P,j,Q,k), denoting X[j] ≡ Y[k], where NAME(P) = "X" and NAME(Q) = "Y". Be sure to check whether the equivalences are contradictory; for example, X[1] ≡ Y[2] contradicts X[2] ≡ Y[1].

Verified: yes
Solve time: 1m49s

Solution

The ordinary Algorithm E maintains a forest of equivalence classes. In the present problem, each node represents an entire array, and the relation between two nodes must record not only that they belong to the same class but also the displacement between their subscripts. Therefore each node $P$ carries a quantity $\operatorname{DELTA}(P)$.

Define, for every node $P$, a quantity $d(P)$ by

$$ d(P)=0 \quad\text{if}\quad \operatorname{PARENT}(P)=\Lambda, $$

and recursively

$$ d(P)=\operatorname{DELTA}(P)+d(\operatorname{PARENT}(P)) \quad\text{if}\quad \operatorname{PARENT}(P)\ne\Lambda. $$

If $R$ is the root of the equivalence class containing $P$, the intended meaning is

$$ \text{location of }X[t] = \text{location of }R[t+d(P)]. $$

Consequently, if two arrays $X$ and $Y$ are represented by nodes $P$ and $Q$,

$$ X[j]\equiv Y[k] $$

is equivalent to

$$ j+d(P)=k+d(Q). \tag{1} $$

The algorithm maintains this invariant.

To process a declaration $(P,j,Q,k)$, first determine the roots of $P$ and $Q$ and the corresponding values $d(P)$ and $d(Q)$.

E1. [Find root of $P$.]

Set

$$ S\leftarrow P,\qquad a\leftarrow0. $$

While $\operatorname{PARENT}(S)\ne\Lambda$, perform

$$ a\leftarrow a+\operatorname{DELTA}(S), \qquad S\leftarrow \operatorname{PARENT}(S). $$

When the loop terminates, $S$ is the root of the class containing $P$, and

$$ a=d(P). $$

E2. [Find root of $Q$.]

Set

$$ T\leftarrow Q,\qquad b\leftarrow0. $$

While $\operatorname{PARENT}(T)\ne\Lambda$, perform

$$ b\leftarrow b+\operatorname{DELTA}(T), \qquad T\leftarrow \operatorname{PARENT}(T). $$

When the loop terminates, $T$ is the root of the class containing $Q$, and

$$ b=d(Q). $$

E3. [Same class?]

If $S=T$, equation (1) requires

$$ j+a=k+b. \tag{2} $$

If (2) fails, the declaration is contradictory and an error is reported. If (2) holds, the declaration adds no new information; proceed to the next declaration.

E4. [Join two classes.]

Assume $S\ne T$.

The declaration requires

$$ j+d(P)=k+d(Q). $$

Since

$$ d(P)=a,\qquad d(Q)=b, $$

we require

$$ j+a=k+b. \tag{3} $$

Attach $T$ below $S$ by setting

$$ \operatorname{PARENT}(T)\leftarrow S. $$

The value assigned to $\operatorname{DELTA}(T)$ must make (3) valid for all members of the former class rooted at $T$.

After the attachment,

$$ d_{\text{new}}(Q) = b+\operatorname{DELTA}(T). $$

Hence

$$ j+a = k+b+\operatorname{DELTA}(T), $$

so we must set

$$ \operatorname{DELTA}(T) = j+a-k-b. \tag{4} $$

E5. [Update bounds.]

Before the attachment, the root $S$ represents the interval

$$ [\operatorname{LBD}(S),,\operatorname{UBD}(S)] $$

of allowable subscripts in its own coordinate system.

A subscript $t$ of an array belonging to the class rooted at $T$ corresponds, after (4), to the root coordinate

$$ t+\operatorname{DELTA}(T). $$

Therefore the interval represented by $T$ becomes

$$ [\operatorname{LBD}(T)+\operatorname{DELTA}(T),, \operatorname{UBD}(T)+\operatorname{DELTA}(T)]. $$

The combined class must contain both intervals. Hence set

$$ \operatorname{LBD}(S) \leftarrow \min!\bigl( \operatorname{LBD}(S),, \operatorname{LBD}(T)+\operatorname{DELTA}(T) \bigr), $$

$$ \operatorname{UBD}(S) \leftarrow \max!\bigl( \operatorname{UBD}(S),, \operatorname{UBD}(T)+\operatorname{DELTA}(T) \bigr). $$

Then proceed to the next declaration.

It remains to prove correctness.

Suppose $R$ is any root. By construction, every node $P$ in its class satisfies

$$ \text{location of }X[t] = \text{location of }R[t+d(P)]. \tag{5} $$

Initially every node is a root, $d(P)=0$, and (5) holds.

Assume (5) holds before step E4. Let $T$ be attached below $S$ using (4). For every node $U$ formerly in the class of $T$,

$$ d_{\text{new}}(U) = d(U)+\operatorname{DELTA}(T). $$

Therefore all coordinates of that class are translated by the same amount, preserving every previously established equivalence. Furthermore,

$$ d_{\text{new}}(Q) = b+\operatorname{DELTA}(T) = j+a-k, $$

by (4), hence

$$ k+d_{\text{new}}(Q) = j+a = j+d(P), $$

which is exactly the new declaration. Thus the invariant is preserved.

When step E3 finds $S=T$, every relation already implied by the existing declarations satisfies (2). Therefore a failure of (2) means that two different offsets are being assigned to the same pair of locations, which is impossible. Conversely, if (2) holds, the new declaration is already implied by the current structure.

Finally, for a root $R$, the interval

$$ [\operatorname{LBD}(R),\operatorname{UBD}(R)] $$

is maintained as the smallest interval in root coordinates containing every allowable subscript of every array in the class. Step E5 replaces it by the union of the two intervals being merged, hence preserves this property. Therefore the compiler need reserve exactly

$$ \operatorname{UBD}(R)-\operatorname{LBD}(R)+1 $$

consecutive locations for that equivalence class.

For the example in the statement, the three declarations yield

$$ \operatorname{DELTA}(\beta)=4,\qquad \operatorname{DELTA}(\delta)=-3,\qquad \operatorname{DELTA}(\gamma)=0 $$

relative to $\delta$, and the root interval becomes

$$ [-5,14], $$

giving the table shown in the text.

Hence the modified algorithm correctly processes equivalence declarations, detects contradictions, and computes the minimum contiguous storage required for each equivalence class.

$$ \boxed{\text{Use Algorithm E with displacement values }\operatorname{DELTA}, \text{ contradiction test }j+d(P)=k+d(Q), \text{ and interval merging as above.}} $$

This completes the proof.

∎