TAOCP 2.2.3 Exercise 22

Program `T` should reject any input that violates the assumptions under which the algorithm was designed.

Section 2.2.3: Linked Allocation

Exercise 22. [23] Program T assumes that its input tape contains valid information, but a program that is intended for general use should always make careful tests on its input so that clerical errors can be detected, and so that the program cannot "destroy itself." For example, if one of the input relations for $k$ were negative, Program T may erroneously change one of its own instructions when storing into X[k]. Suggest ways to modify Program T so that it is suitable for general use.

Verified: no
Solve time: 5m29s


Solution

Program T should reject any input that violates the assumptions under which the algorithm was designed. The essential danger is that input values are used as subscripts into tables such as X, COUNT, and TOP. If an invalid subscript is accepted, the program may overwrite unrelated memory locations, including its own instructions.

Assume that the objects to be sorted are numbered $1,2,\ldots,n$, and that each input relation has the form $j\prec k$.

Before any operation that stores into a location indexed by $j$ or $k$, the program should verify

$$ 1\le j\le n,\qquad 1\le k\le n. $$

If either inequality fails, the program should terminate with an error indication. This prevents negative subscripts and subscripts exceeding the allocated tables.

The program should also verify that the value of $n$ itself is legal. If the tables have been dimensioned for at most $N$ objects, the test

$$ 1\le n\le N $$

must be performed before the tables are initialized.

The linked-list structures created by Program T use nodes obtained from AVAIL. Every operation of the form

$$ P\Leftarrow \mathrm{AVAIL} $$

should be implemented according to definition (6), namely by first testing whether

$$ \mathrm{AVAIL}=\Lambda . $$

If this condition holds, the program should signal OVERFLOW rather than continue. Otherwise the algorithm may attempt to use a nonexistent node.

The input routine should also verify that every relation is presented in the prescribed format. If the relation is encoded by two integers $j$ and $k$, the routine should check that both fields are present and are legitimate integers before they are used.

After all input has been read, a consistency check should be made on the data structure that has been constructed. Let $m$ be the number of input relations accepted. During the execution of Algorithm T, each relation causes exactly one decrement of some COUNT entry. Therefore exactly $m$ decrements should occur. If the algorithm terminates with undeleted relations remaining, a cycle is present in the relation graph. The program should diagnose this condition explicitly instead of merely producing an incomplete output.

Additional protection can be obtained by reserving memory locations immediately preceding and following the tables and filling them with fixed sentinel values. After the input phase and after the execution of the algorithm, these sentinel values can be tested. Any change indicates that some table bound has been exceeded.

Hence a version of Program T suitable for general use should include:

  1. Bounds checks on every input value $j$ and $k$.
  2. Validation of the value of $n$.
  3. The OVERFLOW test required by (6) whenever a node is taken from AVAIL.
  4. Verification that each input relation has the correct format.
  5. Detection and reporting of cycles or other inconsistencies in the relation data.
  6. Optional sentinel checks around tables to detect accidental overwriting of memory.

These modifications prevent invalid input from corrupting memory and ensure that all clerical errors are detected before they can damage the computation.

$$ \boxed{\text{Add bounds checks, overflow checks, format validation, and consistency tests before any table access or node allocation.}} $$