5.2 Random Sampling


10330 twrandomsample ()
illustrates random sampling

This quantlet illustrates that ``arbitrary human choice'' is quite different from proper random sampling. To activate this, the user must type in the following:

  twrandomsample()
After this, the user should see the following window:

10334

This corresponds to a classroom setting (real or virtual), where the students are asked to write down a ``randomly chosen'' number among 1, 2, 3, and 4. The numbers above are default values, chosen as such because most people choose 3, and most of the rest choose 2. The $ \alpha$ level (denoted by alpha) is used to determine the level of significance for the hypothesis test that the numbers are randomly distributed. A default $ \alpha$ level of 0.05 is indicated.

After entering the values, or using default values, clicking on the OK button will produce the following display (this one for the default values):


10337

The top half of the display provides a bar graph of the data entered in the Read Value window. The bottom half gives information about the test of the hypothesis that the entered data are a random sample. Here, the meaning of ``random sample'' is that ``all values are equally likely'', or equivalently, that the data come from a uniform distribution. The test statistic used here, $ \widehat{p}$ (phat), is one of many possible test statistics for this hypothesis. This $ \widehat{p}$ is the empirical (i.e. observed) probability of getting a 2 or a 3, computed from the data entered by the user. If the data really are randomly distributed, we would expect this probability to be close to .5, since this is the probability of getting two choices out of four. Conversely, if this probability is ``far'' from 0.5, the data are most likely not randomly distributed. This is the idea behind this hypothesis test.

We want to test the hypothesis that this is a random (i.e. evenly distributed) sample. Thus, we have the following null and alternative hypotheses:

$\displaystyle H_0: p = 0.5\,, \quad\quad H_1: p \neq 0.5\,.$

On the computer screen, the alternative hypothesis is represented as p <> 0.5, but it means the same -- that $ p$ could be less than or greater than 0.5. To test this hypothesis, we use the test statistic ( $ \widehat{p}$) and see if it lies within our confidence interval. This interval is listed in the third line. For the example above, our confidence interval is the following:

$\displaystyle 0.327 < p < 0.673\,.$

Since our $ \widehat{p}$ in the example is equal to 0.844, it does not lie within our confidence interval, so we can reject $ H_0$. In other words, at the $ \alpha$ = 0.05 level of significance, the data are not randomly distributed.

Here, the user can see how the confidence interval and $ \widehat{p}$ changes for various values in the Read Value window.

The formula used for the computation of the confidence interval is the following:

$\displaystyle 0.5 \pm 1.96 \left(\frac{0.5}{\sqrt{n}}\right)\,.$

It uses the normal distribution as an approximation to the binomial distribution (valid if $ n > 30$).