Consider the well-known definition of the statistic with degrees of freedom:
Nearly years ago, Gossett used a kind of Monte Carlo experiment (without using computers, since they were not yet invented), before he analytically derived the density function of this statistic (and published his results under the pseudonym of Student). So, he sampled values (from an urn) satisfying (3.2), and computed the corresponding value for the statistic defined by (3.1). This experiment he repeated (say) times, so that he could compute the estimated density function (EDF) - also called the empirical cumulative distribution function (ECDF) - of the statistic. (Inspired by these empirical results, he did his famous analysis.)
Let us imitate his experiment, in the following simulation experiment (this procedure is certainly not the most efficient computer program).
We may drop the classic assumption formulated in (3.2), and experiment with non-normal distributions. It is easy to sample from such distributions (see again Chap. II.2). However, we are now confronted with several so-called strategic choices (also see step 1 above): Which type of distribution should be selected (lognormal, exponential, etc.); which parameter values for that distribution type (mean and variance for the lognormal, etc.), which sample size (for asymptotic, 'large' , the distribution is known to be a good approximation for our EDF).
Besides these choices, we must face some tactical issues: Which number of macro-replicates gives a good EDF; can we use special variance reducing techniques (VRTs) - such as common random numbers and importance sampling - to reduce the variability of the EDF? We explain these techniques briefly, as follows.
Common random numbers (CRN) mean that the analysts use the same (pseudo)random numbers (PRN) - symbol - when estimating the effects of different strategic choices. For example, CRN are used when comparing the estimated quantiles for various distribution types. Obviously, CRN reduces the variance of estimated differences, provided CRN creates positive correlations between the estimators being compared.
Antithetic variates (AV) mean that the analysts use the complements of the PRN () in two 'companion' macro-replicates. Obviously, AV reduces the variance of the estimator averaged over these two replicates, provided AV creates negative correlation between the two estimators resulting from the two replicates.
Importance sampling (IS) is used when the analysts wish to estimate a rare event, such as the probability of the Student statistic exceeding the quantile. IS increases that probability (for example, by sampling from a distribution with a fatter tail) - and later on, IS corrects for this distortion of the input distribution (through the likelihood ratio). IS is not so simple as CRN and AV - but without IS too much computer time may be needed. See Glasserman et al. (2000).
There are many more VRTs. Both CRN and AV are intuitively attractive and easy to implement, but the most popular one is CRN. The most useful VRT may be IS. In practice, the other VRTs often do not reduce the variance drastically so many users prefer to spend more computer time instead of applying VRTs. (VRTs are a great topic for doctoral research!) For more details on VRTs, I refer to Kleijnen and Rubinstein (2001).
Finally, the density function of the sample data may not be an academic problem: Suppose a very limited set of historical data is given, and we must analyze these data while we know that these data do not satisfy the classic assumption formulated in (3.2). Then bootstrapping may help, as follows (also remember the six steps above).