TAOCP 1.3.3 Exercise 6
We are asked to determine the effect on the execution timing of Program $A$ if the assumption that all blank words appear at the extreme right of the input is removed.
Section 1.3.3: Applications to Permutations
Exercise 6. [M28] What changes are made to the timing of Program $A$ if we remove the assumption that all blank words occur at the extreme right?
Verified: yes
Solve time: 1m21s
Setup
We are asked to determine the effect on the execution timing of Program $A$ if the assumption that all blank words appear at the extreme right of the input is removed. Program $A$ (lines 07–86) multiplies permutations expressed in cycle notation. The non-I/O portion of the program, which carries out Algorithm $A$, is designed under the implicit assumption that all blank fields are at the right end of the input. Under this assumption, loops that scan sequentially for untagged elements or successors (for example, lines 42–46 and 67–71) encounter nonblank words contiguously, avoiding unnecessary comparisons with blanks. The exercise asks what changes occur to the timing if blanks may appear interspersed among nonblank words.
Let $Y$ denote the total number of nonblank words in the input, $B$ the number of left parentheses (, $D$ the number of right parentheses ), $H$ the number of cycles, $S$ the number of singleton cycles, and $P$ the number of distinct elements, as in Section 1.3.3. Let $SIZE$ be the total number of input words. The key loops affected are:
- Loop 2B–46 (
1H–1B) for tagging left and right parentheses (step $A1$). - Loops 67–71 and 75–81 (scanning for the successor
CURRENTuntil it equalsSTART).
We are to analyze the timing effect in terms of the number of MIX operations (or equivalently, step counts) as a function of $SIZE$ and the input data characteristics.
Solution
Algorithm $A$ scans sequentially for untagged elements or for occurrences of CURRENT in the formula. Originally, all blank words are at the extreme right; therefore, in any scan from left to right, the program will encounter $Y$ nonblank words before reaching the end marker =. Each comparison with a nonblank word counts as one iteration. Since blanks are at the extreme right, the scan never needs to process a blank field during step $A2$ (finding START) or step $A4$ (finding CURRENT).
If blanks are interspersed, then each scan from left to right potentially encounters some number $S_b$ of blank words before finding the desired untagged element or the matching CURRENT. Let $SIZE$ be the total number of fields including blanks. The worst-case number of comparisons for a scan of PERM to find CURRENT is then increased from $Y$ to $SIZE$, since in the worst case the program may scan all words before locating the target element. Specifically:
- Step $A2$ (finding START): Previously, this required at most $Y$ comparisons. Now, blanks can appear anywhere, so the maximum number of comparisons is $SIZE$. Let $t_2$ denote the number of steps for finding `START$; then $t_2 \le SIZE$ per iteration.
- Step $A4$ (scanning for CURRENT): Previously, each element search required scanning through nonblank words only, $Y - 1$ comparisons on average. With blanks interspersed, each iteration may compare against $SIZE - 1$ words. Let $t_4$ denote the number of comparisons per successor; then $t_4 \le SIZE$.
- Step $A5$ (closing cycle): The number of iterations remains the same since it depends on the length of the cycle, but each comparison to check
$CURRENT = START$now may require scanning blanks. - Loop overheads and tagging operations remain unchanged; only the number of comparisons in the scanning loops increases.
Consequently, the timing of Program $A$ is affected as follows:
- Each scan for
START(step $A2$) may require up to $SIZE$ operations rather than $Y$. Since step $A2$ is executed once per cycle, the total additional time due to blanks is at most $(SIZE - Y) \cdot H$, where $H$ is the number of cycles. - Each scan for
CURRENT(step $A4$) may require up to $SIZE$ operations rather than $Y$ operations. Since there are $Y$ nonblank words, and each may be part of a cycle of length $>1$, the total additional comparisons are bounded by $(SIZE - Y) \cdot Y$. - Steps that output elements or tag fields (increments, moves, etc.) are unaffected in count; only the loop iteration counts increase.
Therefore, the overall time complexity in terms of the number of MIX instructions executed increases proportionally to the number of blank words that are interspersed among nonblank words. More precisely, if $B_s = SIZE - Y$ is the number of blank fields, then the number of additional comparisons in scanning loops is bounded by $B_s \cdot (H + Y)$.
No other structural changes to the program occur; the algorithm remains correct because blank fields are skipped by comparison checks, but each scan may now traverse additional blank fields.
Verification
Let us check this argument for a simple example. Suppose the input permutation formula consists of $Y = 6$ nonblank words representing elements, and $B_s = 4$ blank words interspersed. Originally, the scan for START would inspect $6$ words; now, in the worst case, it may inspect all $10$ fields. Similarly, for each successor CURRENT, the scan may inspect $10$ fields instead of $6$. Thus, the total number of loop iterations in lines 42–46 and 67–71 increases by approximately $(10-6) \cdot (\text{number of cycles + elements})$, confirming the scaling derived above.
The effect is additive across all cycles, and the order of processing does not change; hence, the final output permutation is unaffected. Only the timing, measured in steps or MIX instructions, increases.
Notes
This analysis can be generalized to any input where some fields are considered "inactive" (e.g., comments or markers) that must be skipped by scanning loops. The key observation is that sequential search loops are sensitive to the density of relevant elements in memory; introducing scattered blanks increases the number of comparisons linearly in the number of blanks per scan. If a more efficient search is required, a preprocessing step could compact the input, moving all nonblank fields to the left, restoring the original timing assumption.
$\boxed{\text{Removing the extreme-right blanks assumption increases the number of comparisons in scanning loops from $Y$ to $SIZE$, adding at most $B_s(H+Y)$ operations, where $B_s$ is the number of blank words.}}$
This completes the proof.
∎