TAOCP 1.2.2 Exercise 26

Section 1.2.2: Numbers, Powers, and Logarithms

Exercise 26. [M27] Find a rigorous upper bound on the error made by the algorithm in the previous exercise, based on the precision used in the arithmetic operations.

Verified: no
Solve time: 29s

Setup

Let the computer work with fixed precision $\delta > 0$, meaning that every right shift and every subtraction is performed with an error whose absolute value is at most $\delta$. Let $x_0$ be the initial value, with

$$ 1 \le x_0 < 2. $$

Exercise 25 defines an exact algorithm whose successive reductions satisfy

$$ x \leftarrow x\left(1-\frac1{2^k}\right), $$

and whose accumulated sum is

$$ y \leftarrow y+\log_b\frac{2^k}{2^k-1}. $$

The exact identity

$$ \log_b x_0 = \sum \log_b\frac{2^k}{2^k-1} + \log_b x_f \tag{1} $$

holds, where $x_f$ is the final value. In exact arithmetic the process never terminates until $x_f=1$, hence the computed sum equals $\log_b x_0$.

The problem is to obtain a rigorous upper bound for the error when arithmetic is carried out only to precision $\delta$.

Solution

Let $\widetilde x$ denote the quantity actually stored in the machine. At each execution of Step L4, the exact new value would be

$$ x' = x\left(1-\frac1{2^k}\right). $$

Because subtraction and shifting are performed only to precision $\delta$, the stored value satisfies

$$ \widetilde x' = x' + \varepsilon , \qquad |\varepsilon|\le \delta . \tag{2} $$

Hence every reduction introduces an additive perturbation bounded by $\delta$.

Suppose that the algorithm performs reductions with indices

$$ k_1,k_2,\ldots,k_r . $$

The exact reduction factor is

$$ P = \prod_{j=1}^{r} \left(1-\frac1{2^{k_j}}\right). \tag{3} $$

If arithmetic were exact,

$$ x_r=x_0P . \tag{4} $$

The accumulated logarithm would be

$$ Y = \sum_{j=1}^{r} \log_b\frac{2^{k_j}}{2^{k_j}-1} = -\log_b P . \tag{5} $$

Consequently

$$ Y+\log_b x_r = \log_b x_0 . \tag{6} $$

In the machine computation, termination occurs when the stored value reaches $1$. Let the exact value corresponding to the final stored quantity be $x_r^*$. Because each reduction contributes at most $\delta$ of error and there are $r$ reductions,

$$ |x_r^*-1| \le r\delta . \tag{7} $$

The computed logarithm is $Y$, while the true logarithm equals, by (6),

$$ \log_b x_0 = Y+\log_b x_r^* . $$

Therefore the total error is

$$ E = \bigl|\log_b x_0-Y\bigr| = |\log_b x_r^*|. \tag{8} $$

Write

$$ x_r^*=1+\eta , \qquad |\eta|\le r\delta . \tag{9} $$

For $|\eta|<1$,

$$ |\ln(1+\eta)| \le \frac{|\eta|}{1-|\eta|}, \tag{10} $$

which follows from the mean-value theorem. Using

$$ \log_b t=\frac{\ln t}{\ln b}, $$

we obtain

$$ E = \frac{|\ln(1+\eta)|}{|\ln b|} \le \frac{|\eta|}{|\ln b|(1-|\eta|)} \le \frac{r\delta}{|\ln b|,(1-r\delta)}. \tag{11} $$

This is a rigorous error bound.

It remains to bound $r$. Every reduction multiplies the current value by a factor at most $1/2$, since

$$ 1-\frac1{2^k}\le\frac12 \qquad(k\ge1). $$

Thus after $r$ reductions,

$$ x_r^* \le \frac{x_0}{2^r} < \frac2{2^r}. $$

Since the process terminates only after the value has been reduced to a neighborhood of $1$, necessarily

$$ r\le \lg x_0 +1 <2. $$

More generally, if the computation is carried out to $t$ binary places, then no index larger than $t$ can occur, and therefore

$$ r\le t . \tag{12} $$

Substituting (12) into (11) yields the practical bound

$$ E \le \frac{t\delta}{|\ln b|,(1-t\delta)}, \qquad t\delta<1. \tag{13} $$

Hence the error in the logarithm is at most

$$ \boxed{\frac{t\delta}{|\ln b|,(1-t\delta)}}. $$

Verification

Equation (5) is obtained by telescoping logarithms:

$$ \sum_{j=1}^{r} \log_b\frac{2^{k_j}}{2^{k_j}-1} = -\sum_{j=1}^{r} \log_b\left(1-\frac1{2^{k_j}}\right) = -\log_b P. $$

Equation (6) then follows from

$$ x_r=x_0P. $$

The only discrepancy between the exact and machine computations is the residual factor $\log_b x_r^*$ left when the machine stops. Bounding the final multiplicative discrepancy by $|\eta|\le r\delta$ and converting it to logarithmic scale gives (11). Therefore the logarithmic error is controlled entirely by the final residual deviation from $1$.

Notes

A sharper bound can be obtained if the implementation specifies separate error bounds for shifting and subtraction, or if truncation rather than rounding is assumed. The argument above applies to any fixed precision model in which each arithmetic operation introduces an error of magnitude at most $\delta$.