As an ''appetizer'' we give two simple examples of use of transformations in statistics, Fisher and Box-Cox transformations as well as the empirical Fourier-Stieltjes transform.
Let be a sample from bivariate normal distribution , and , .
The Pearson coefficient of linear correlation
(a)
(b)
|
To exemplify the above, we generated pairs of normally distributed random samples with theoretical correlation . This was done by generating two i.i.d. normal samples , and of length and taking the transformation , . The sample correlation coefficient is found. This was repeated times. The histogram of sample correlation coefficients is shown in Fig. 7.1a. The histogram of -transformed 's is shown in Fig. 7.1b with superimposed normal approximation .
(i) For example, confidence interval for is:
If and , and . In terms of the confidence interval is .
(ii) Assume that two samples of size and , respectively, are obtained form two different bivariate normal populations. We are interested in testing against the two sided alternative. After observing and and transforming them to and , we conclude that the -value of the test is .
As an illustration, we apply the Box-Cox transformation to apparently skewed data of CEO salaries.
Forbes magazine published data on the best small firms in 1993. These were firms with annual sales of more than five and less than million. Firms were ranked by five-year average return on investment. One of the variables extracted is the annual salary of the chief executive officer for the first ranked firms (since one datum is missing, the sample size is ). Figure 7.2a shows the histogram of row data (salaries). The data show moderate skeweness to the right. Figure 7.2b gives the values of likelihood in (7.2) for different values of . Note that (7.2) is maximized for approximately equal to . Figure 7.2c gives the transformed data by Box-Cox transformation with . The histogram of transformed salaries is notably symetrized.
The characteristic function of a probability distribution is defined as its Fourier-Stieltjes transform,
(7.3) |
For a sample one defines empirical characteristic function as
Following Murata (2001)[20] we describe how the empirical characteristic function can be used in testing for the independence of two components in bivariate distributions.
Given the bivariate sample , , we are interested in testing for independence of the components and . The test can be based on the following bivariate process,
Murata (2001)[20] shows that has Gaussian weak limit and that
(a) (b) (c) |
We generated two independent components from the Beta() distribution of size and found statistics and corresponding -values times. Figure 7.3a,b depicts histograms of statistics and values based on simulations. Since the generated components and are independent, the histogram for agrees with asymptotic distribution, and of course, the -values are uniform on . In Fig. 7.3c we show the -values when the components and are not independent. Using two independent Beta() components and , the second component is constructed as . Notice that for majority of simulational runs the independence hypothesis is rejected, i.e., the -values cluster around 0.