Section 1.2.2: Numbers, Powers, and Logarithms

Exercise 28. [M30] (R. Feynman.) Develop a method for computing $b^x$ when $0 \le x < 1$, using only shifting, addition, and subtraction (similar to the algorithm in exercise 25), and analyze its accuracy.

Verified: no
Solve time: 27s

Setup

Let

$0 \le x < 1,$

and let $b>0$, $b\ne1$. We seek a procedure for computing

$b^x,$

using only shifting, addition, and subtraction, in a manner analogous to Exercise 25.

Write the binary expansion of $x$ as

$x=\sum_{k=1}^\infty \frac{a_k}{2^k}, \qquad a_k\in{0,1}. \eqno(1)$

Then

= b^{\sum a_k/2^k} = \prod_{k=1}^\infty b^{a_k/2^k} = \prod_{a_k=1} b^{1/2^k}, \eqno(2)$$ by the laws of exponents (5). Define $$c_k=b^{1/2^k}-1. \eqno(3)$$ Then $$b^{1/2^k}=1+c_k,$$ hence $$b^x=\prod_{a_k=1}(1+c_k). \eqno(4)$$ The constants $c_k$ may be precomputed once and stored. Since $$b^{1/2^k}\to1,$$ we have $$c_k\to0.$$ The problem is therefore reduced to multiplying selected factors $(1+c_k)$ using only shifts, additions, and subtractions. ## Solution Assume first that $b>1$. Let $$x=(0.a_1a_2a_3\ldots)2.$$ We use the following algorithm. ### Algorithm E E1. [Initialize.] Set $$y\leftarrow1,\qquad k\leftarrow1.$$ E2. [Test digit.] Determine the binary digit $a_k$ of $x$. E3. [Multiply if necessary.] If $a_k=1$, set $$y\leftarrow y+y,c_k.$$ E4. [Advance.] Set $$k\leftarrow k+1,$$ and repeat from E2. The operation in E3 requires only shifting, addition, and subtraction provided the constants $c_k$ are represented in binary form. For if $$c_k=\sum{j=1}^m \frac{\gamma_{k,j}}{2^j}, \qquad \gamma_{k,j}\in{-1,0,1}, \eqno(5)$$ then $$y,c_k =\sum_{j=1}^m \gamma_{k,j}\frac{y}{2^j}, \eqno(6)$$ which is obtained by shifts and additions or subtractions. We now prove correctness. Let $$x_n=\sum_{k=1}^n\frac{a_k}{2^k}. \eqno(7)$$ After step E3 has been executed for $k=1,2,\ldots,n$, the value of $y$ is $$y_n=\prod_{k=1}^n(1+c_k)^{a_k}. \eqno(8)$$ Since $1+c_k=b^{1/2^k}$, we obtain \begin{aligned} y_n &=\prod_{k=1}^n\left(b^{1/2^k}\right)^{a_k}
&=b^{\sum_{k=1}^n a_k/2^k}
&=b^{x_n}. \eqno(9) \end{aligned} Because $$0\le x-x_n<\frac1{2^n}, \eqno(10)$$ we have $$b^{x_n}\le b^x<b^{x_n+2^{-n}}. \eqno(11)$$ Therefore $$y_n\le b^x< y_n,b^{2^{-n}}, \eqno(12)$$ and hence $$0\le b^x-y_n< y_n\left(b^{2^{-n}}-1\right). \eqno(13)$$ This gives an explicit truncation bound. To analyze computational error, suppose each multiplication in step E3 is performed with relative error at most $\epsilon_k$, where $$|\epsilon_k|\le\epsilon. \eqno(14)$$ If the exact value after $n$ stages is $b^{x_n}$ and the computed value is $y_n'$, then

y_n'

b^{x_n} \prod_{k:,a_k=1}(1+\epsilon_k). \eqno(15) Since at most $n$ factors occur, (1-\epsilon)^n \le \prod_{k:,a_k=1}(1+\epsilon_k) \le (1+\epsilon)^n. \eqno(16) $$Hence$$ b^{x_n}(1-\epsilon)^n \le y_n' \le b^{x_n}(1+\epsilon)^n. \eqno(17) $$Combining (11) and (17), we obtain$$ b^x,b^{-2^{-n}}(1-\epsilon)^n < y_n' \le b^x(1+\epsilon)^n. \eqno(18) Taking logarithms to the base $b$, x-2^{-n}+n\log_b(1-\epsilon) < \log_b y_n' \le x+n\log_b(1+\epsilon). \eqno(19) Thus the truncation error decreases geometrically with $n$, while the accumulated arithmetic error grows at most linearly in the number of stages. When $0<b<1$, define $$b^x=(1/b)^{-x},$$ as in the text. Since $$-1< -x\le0,$$ the same procedure applies to the base $1/b>1$. ## Verification We verify formula (12) independently. From (10), $$0\le x-x_n<2^{-n}.$$ Since $b>1$, exponentiation preserves order; therefore $$1\le b^{x-x_n}<b^{2^{-n}}. \eqno(20)$$ Multiplying by $b^{x_n}$ gives $$b^{x_n}\le b^x<b^{x_n}b^{2^{-n}}. \eqno(21)$$ Using $y_n=b^{x_n}$ from (9), $$y_n\le b^x<y_n,b^{2^{-n}},$$ which is precisely (12). We also verify the inductive formula (8). Initially, $$y_0=1.$$ Suppose after stage $n-1$, $$y_{n-1}=\prod_{k=1}^{n-1}(1+c_k)^{a_k}.$$ If $a_n=0$, step E3 leaves $y$ unchanged, hence

y_n

\prod_{k=1}^{n}(1+c_k)^{a_k}. If $a_n=1$, step E3 multiplies by $(1+c_n)$, so \begin{aligned} y_n &=y_{n-1}(1+c_n)
&=\prod_{k=1}^{n}(1+c_k)^{a_k}. \end{aligned} Thus (8) holds for all $n$. This completes the proof. ∎ ## Notes A particularly convenient choice is $b=e$. Then $$c_k=e^{1/2^k}-1,$$ and for large $k$, $$c_k\approx \frac1{2^k}.$$Consequently the later stages require only very small corrections, obtainable almost entirely by shifts.

TAOCP 1.2.2 Exercise 28

Setup

y_n'

y_n