TAOCP 3.3.1 Exercise 3
To test whether the dice are fair, we must use the probability distribution for the sum of two ordinary dice.
Section 3.3.1: General Test Procedures for Studying Random Data
Exercise 3. ▶ [23] Some dice that were loaded as described in the previous exercise were rolled 114 times, and the following values were observed:
| value of $s =$ | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| observed number, $Y_s =$ | 2 | 6 | 10 | 16 | 18 | 32 | 20 | 13 | 16 | 9 | 2 |
Apply the chi-square test to these values, using the probabilities in (1), pretending that the dice are not in fact known to be faulty. Does the chi-square test detect the bad dice? If not, explain why not.
Verified: yes
Solve time: 7m08s
Solution
To test whether the dice are fair, we must use the probability distribution for the sum of two ordinary dice. Thus the null hypothesis is
$$ p_s=\frac{1,2,3,4,5,6,5,4,3,2,1}{36}, \qquad s=2,3,\ldots,12. $$
The observed frequencies are
$$ \begin{array}{c|ccccccccccc} s &2&3&4&5&6&7&8&9&10&11&12\ \hline Y_s&2&6&10&16&18&32&20&13&16&9&2 \end{array} $$
with total
$$ n=114. $$
The expected frequencies are therefore
$$ np_s = 114\cdot \frac{1,2,3,4,5,6,5,4,3,2,1}{36}, $$
namely
$$ \begin{array}{c|ccccccccccc} s &2&3&4&5&6&7&8&9&10&11&12\ \hline np_s &\frac{19}{6} &\frac{19}{3} &\frac{19}{2} &\frac{38}{3} &\frac{95}{6} &19 &\frac{95}{6} &\frac{38}{3} &\frac{19}{2} &\frac{19}{3} &\frac{19}{6} \end{array} $$
The chi-square statistic is
$$ V=\sum_{s=2}^{12}\frac{(Y_s-np_s)^2}{np_s}. $$
We compute the contributions term by term:
$$ \begin{aligned} s=2:\quad& \frac{(2-\frac{19}{6})^2}{\frac{19}{6}} = \frac{25}{114} \approx0.219, \[2mm] s=3:\quad& \frac{(6-\frac{19}{3})^2}{\frac{19}{3}} = \frac{1}{57} \approx0.018, \[2mm] s=4:\quad& \frac{(10-\frac{19}{2})^2}{\frac{19}{2}} = \frac{1}{38} \approx0.026, \[2mm] s=5:\quad& \frac{(16-\frac{38}{3})^2}{\frac{38}{3}} = \frac{121}{342} \approx0.354, \[2mm] s=6:\quad& \frac{(18-\frac{95}{6})^2}{\frac{95}{6}} = \frac{169}{570} \approx0.296, \[2mm] s=7:\quad& \frac{(32-19)^2}{19} = \frac{169}{19} \approx8.895, \[2mm] s=8:\quad& \frac{(20-\frac{95}{6})^2}{\frac{95}{6}} = \frac{625}{570} \approx1.096, \[2mm] s=9:\quad& \frac{(13-\frac{38}{3})^2}{\frac{38}{3}} = \frac{1}{57} \approx0.018, \[2mm] s=10:\quad& \frac{(16-\frac{19}{2})^2}{\frac{19}{2}} = \frac{169}{38} \approx4.447, \[2mm] s=11:\quad& \frac{(9-\frac{19}{3})^2}{\frac{19}{3}} = \frac{64}{57} \approx1.123, \[2mm] s=12:\quad& \frac{(2-\frac{19}{6})^2}{\frac{19}{6}} = \frac{25}{114} \approx0.219. \end{aligned} $$
Hence
$$ V \approx 0.219+0.018+0.026+0.354+0.296+8.895+1.096+0.018+4.447+1.123+0.219 \approx16.71. $$
There are $k=11$ classes, and no parameters have been estimated from the data, so the number of degrees of freedom is
$$ \nu=k-1=10. $$
From the chi-square table for $\nu=10$,
$$ \chi^2_{0.95}\approx18.31. $$
Since
$$ V\approx16.71<18.31, $$
the test does not reject the hypothesis that the dice are fair at the $5%$ significance level.
Thus the chi-square test does not detect that the dice are bad.
The reason is that only $114$ throws were made. The deviations from fairness are not large enough, relative to the expected random fluctuations, to produce a chi-square value exceeding the usual critical threshold. In particular, most of the contribution to $V$ comes from the unusually large number of $7$'s and $10$'s, but the total discrepancy is still within the range that can plausibly occur by chance for a sample of this size.
$$ \boxed{V\approx16.71,\qquad \nu=10,\qquad \text{do not reject fairness at the 5% level}} $$