TAOCP 3.4.1 Exercise 3

Suppose we have a uniform random variable $U$ between 0 and 1, represented in a computer word with $m$ possible discrete values, $0, 1, \ldots, m-1$, as in Section 3.

Section 3.4.1: Numerical Distributions

Exercise 3. ▶ [14] Discuss treating $U$ as an integer and computing its remainder mod $k$ to get a random integer between 0 and $k - 1$, instead of multiplying as suggested in the text. Thus (1) would be changed to

ENTA 0;  LDX U;  DIV K,

with the result appearing in register X. Is this a good method?

Verified: yes
Solve time: 1m34s


Solution

Suppose we have a uniform random variable $U$ between 0 and 1, represented in a computer word with $m$ possible discrete values, $0, 1, \ldots, m-1$, as in Section 3.4.1. We wish to generate a random integer $X$ between $0$ and $k-1$. The text suggests the method

$X = \lfloor k U \rfloor, \eqno(1)$

which, on a MIX computer, can be implemented as

LDA U
MUL K
STX X

producing $X$ in register A.

The proposed alternative is to treat $U$ as an integer between $0$ and $m-1$ and take its remainder modulo $k$:

ENTA 0
LDX U
DIV K

with the remainder appearing in register X.

We examine the probability distribution of the remainder method. Let $U$ be uniformly distributed over $0,1,\dots,m-1$. Denote by $r = U \bmod k$ the remainder. Then $r$ takes values $0,1,\dots,k-1$. The exact probability that $r = j$ is

$$ \Pr{r = j} = \frac{#{i \in {0,1,\dots,m-1} \mid i \bmod k = j}}{m}. $$

If $m$ is not divisible by $k$, write $m = q k + s$, where $0 < s < k$. Then exactly $q+1$ values of $i$ yield remainders $0,1,\dots,s-1$, and exactly $q$ values yield remainders $s,\dots,k-1$. Therefore, the probability distribution of $r$ is

$$ \Pr{r = j} = \begin{cases} \dfrac{q+1}{m}, & 0 \le j < s,\[2mm] \dfrac{q}{m}, & s \le j < k. \end{cases} $$

Since $q+1 \neq q$, this distribution is not uniform unless $k$ divides $m$ exactly.

In contrast, the multiplication method $X = \lfloor k U / m \rfloor$ produces

$$ \Pr{X = j} = \frac{\lfloor (j+1)m/k \rfloor - \lfloor jm/k \rfloor}{m}, \quad 0 \le j < k, $$

which is closer to uniform; the discrepancy is at most one unit in $m$, and is negligible if $k \ll m$ (as noted in Section 3.4.1, Remark A).

Therefore, the remainder method introduces bias whenever $k \nmid m$, favoring smaller remainders if $s > 0$. Only when $k$ divides $m$ exactly do both methods produce exactly uniform integers. In practice, for common values of $k$ and computer word size $m$, $k$ rarely divides $m$, so the remainder method is generally inferior.

This completes the proof.