7.1 Introduction

As an ''appetizer'' we give two simple examples of use of transformations in statistics, Fisher

and Box-Cox transformations as well as the empirical Fourier-Stieltjes transform.

Example 1 Assume that we are looking for variance transformation $Y=\vartheta(X)$ , in the case where $\mathrm{Var}\, X=\sigma^2_X(\mu)$ is a function of the mean $\mu=\mathrm{E}\, X$ . The first order Taylor expansion of $\vartheta(X)$ about mean $\mu$ is

$\displaystyle \vartheta(X) = \vartheta(\mu) + (X-\mu) \vartheta'(\mu) + O\left[ (X-\mu)^2 \right]\,.$

Ignoring quadratic and higher order terms we see that

$\displaystyle \mathrm{E}\,\vartheta(X) \approx 0{}, \quad \mathrm{Var}\, \varth... ...\mu)^2 \vartheta'(\mu)\right] = \left[\vartheta'(x)\right]^2 \sigma^2_X(\mu){}.$

If $\mathrm{Var}\,(\vartheta(X))$ is to be , we obtain

$\displaystyle \left[\vartheta'(x)\right]^2 \sigma^2_X(\mu) = c^2$

resulting in

$\displaystyle \vartheta(x) = c \int \frac{{\mathrm{d}} x}{\sigma_X(x)} \,{\mathrm{d}} x{}.$

This is a theoretical basis for the so-called Fisher -transformation.

Let $(X_{11},X_{21}), \ldots, (X_{1n},X_{2n})$ be a sample from bivariate normal distribution $N_2(\mu_1, \mu_2, \sigma_1^2, \sigma_2^2, \rho)$ , and $\bar{X}_i = 1/n \sum_{j=1}^n X_{ij}$ , .

The Pearson coefficient of linear correlation

$\displaystyle r = \frac{\sum_{i=1}^n (X_{1i} - \bar {X_1})(X_{2i}-\bar{X_2})} {... ..._1}\right)^2 \cdot \sum_{i=1}^n \left(X_{2i}- \bar{X_2}\right)^2\right]^{1/2} }$

has a complicated distribution involving special functions, e.g., Anderson (1984, p. 113)[1]. However, it is well known that the asymptotic distribution for is normal $N(\rho, \frac{(1-\rho^2)^2}{n})$ . Since the variance is a function of mean,

$\displaystyle \vartheta(\rho)$	$\displaystyle = \int \frac{c \sqrt{n} }{1 - \rho^2} \mathrm{d} \rho$
	$\displaystyle = \frac{c \sqrt{n}}{2} \int \left( \frac{1}{1-\rho} + \frac{1}{1+\rho} \right) \mathrm{d} \rho$
	$\displaystyle = \frac{c \sqrt{n}}{2} \log \left( \frac{1+\rho}{1-\rho} \right) + k$

is known as Fisher -transformation for the correlation coefficient (usually for $c=1/\sqrt{n}$ and ). Assume that and $\rho$ are mapped to and $\zeta$ as

$\displaystyle z = \frac{1}{2} \log \left( \frac{1+r}{1-r} \right) = \mathrm{arc... ...{1}{2} \log \left( \frac{1+\rho}{1-\rho} \right) = \mathrm{arctanh}\,\, \rho{}.$

The distribution of is approximately normal $N(\zeta, 1/(n-3))$ and this approximation is quite accurate when $\rho^2/n^2$ is small and when is as low as . The use of Fisher -transformation is illustrated on finding the confidence intervals for $\rho$ and testing hypotheses about $\rho$ .

**Figure 7.1:** (a) Simulational run of , 's from the bivariate population having theorethical $\rho =\sqrt {2}/2.$ ; (b) The same 's transformed to 's with the normal approximation superimposed
$\includegraphics[width=5.1cm]{text/2-7/figure1a.eps}$ (a) $\includegraphics[width=5.1cm]{text/2-7/figure1b.eps}$ (b)

To exemplify the above, we generated pairs of normally distributed random samples with theoretical correlation $\sqrt{2}/2$ . This was done by generating two i.i.d. normal samples , and of length and taking the transformation , . The sample correlation coefficient is found. This was repeated times. The histogram of sample correlation coefficients is shown in Fig. 7.1a. The histogram of -transformed 's is shown in Fig. 7.1b with superimposed normal approximation $N(\mathrm{arctanh}\,(\sqrt{2}/2), 1/(30-3))$ .

(i) For example, $(1-\alpha)$ ${100} \%$ confidence interval for $\rho$ is:

$\displaystyle \left[ \tanh \left(z - \frac{\Phi^{-1}(1-\alpha/2)}{\sqrt{n-3}}\right), \tanh \left(z + \frac{\Phi^{-1}(1-\alpha/2)}{\sqrt{n-3}}\right) \right]{},$

where $z=\mathrm{arctanh}\,(r)$ and $\tanh x = (\mathrm{e}^x - \mathrm{e}^{-x})/(\mathrm{e}^x + \mathrm{e}^{-x})$ and $\Phi$ stands for the standard normal cumulative distribution function.

If and , $z_L=-{0.6456} - {1.96}/{5}= -{1.0376}$ and $z_U=-{0.6456} + {1.96}/{5} = -{0.2536}$ . In terms of $\rho$ the ${95}\%$ confidence interval is .

(ii) Assume that two samples of size and , respectively, are obtained form two different bivariate normal populations. We are interested in testing $H_0: \rho_1 = \rho_2$ against the two sided alternative. After observing and and transforming them to and , we conclude that the -value of the test is $2 \Phi(-\vert z_1 - z_2\vert/\sqrt{ 1/(n_1 -3) + 1/(n_2 -3)})$ .

Example 2 Box and Cox (1964)[4] introduced a family of transformations, indexed by real parameter $\lambda$ , applicable to positive data $X_1, \ldots, X_n$ ,

$\displaystyle Y_i = \begin{cases}\frac{X_i^\lambda - 1}{\lambda}{},& \lambda \ne 0 \\ \log X_i{}, & \lambda = 0{}. \end{cases}$

(7.1)

This transformation is mostly applied to responses in linear models exhibiting non-normality and/or heteroscedasticity. For properly selected $\lambda$ , transformed data $Y_1, \ldots, Y_n$ may look ''more normal'' and amenable to standard modeling techniques. The parameter $\lambda$ is selected by maximizing the log-likelihood,

$\displaystyle (\lambda - 1) \sum_{i=1}^n \log X_i - \frac{n}{2} \log\left[ \frac{1}{n} \sum_{i=1}^n \left(Y_i - \bar{Y}_i\right)^2 \right]{},$

(7.2)

where are given in (7.1) and $\bar{Y}_i = 1/n\sum_{i=1}^n Y_i$ .

As an illustration, we apply the Box-Cox transformation to apparently skewed data of CEO salaries.

Forbes magazine published data on the best small firms in 1993. These were firms with annual sales of more than five and less than million. Firms were ranked by five-year average return on investment. One of the variables extracted is the annual salary of the chief executive officer for the first ranked firms (since one datum is missing, the sample size is ). Figure 7.2a shows the histogram of row data (salaries). The data show moderate skeweness to the right. Figure 7.2b gives the values of likelihood in (7.2) for different values of $\lambda$ . Note that (7.2) is maximized for $\lambda$ approximately equal to . Figure 7.2c gives the transformed data by Box-Cox transformation with $\lambda ={0.45}$ . The histogram of transformed salaries is notably symetrized.

**Figure 7.2:** (a) Histogram of row data (CEO salaries); (b) Log-likelihood is maximized at $\lambda ={0.45}$ ; and (c) Histogram of Box-Cox-transformed data
$\includegraphics[width=3.5cm]{text/2-7/figure2a.eps}$ (a) $\includegraphics[width=3.65cm]{text/2-7/figure2b.eps}$ (b) $\includegraphics[width=3.5cm]{text/2-7/figure2c.eps}$ (c)

Example 3 As an example of transforms utilized in statistics, we provide an application of empirical Fourier-Stieltjes transform (empirical characteristic function) in testing for the independence.

The characteristic function of a probability distribution is defined as its Fourier-Stieltjes transform,

$\displaystyle \varphi_X(t) = \mathrm{E}\, \exp(\mathrm{i} t X){},$

(7.3)

where $\mathrm{E}$ is expectation and random variable has distribution function . It is well known that the correspondence of characteristic functions and distribution functions is -, and that closeness in the domain of characteristic functions corresponds to closeness in the domain of distribution functions. In addition to uniqueness, characteristic functions are bounded. The same does not hold for moment generating functions which are Laplace transforms of distribution functions.

For a sample $X_1, X_2, \ldots, X_n$ one defines empirical characteristic function $\varphi^{\ast}(t)$ as

$\displaystyle \varphi^{\ast}_X(t) = \frac{1}{n} \sum_{j=1}^n \exp(i t X_j){}.$

The result by Feuerverger and Mureika (1977)[9] establishes the large sample properties of the empirical characteristic function.

Theorem 1 For any $T <\infty$

$\displaystyle P\left[ \lim_{n \rightarrow \infty} \sup_{\vert t\vert \leq T} \vert\varphi^{\ast}(t) - \varphi(t)\vert=0 \right]=1$

holds. Moreover, when $n\rightarrow \infty$ , the stochastic process

$\displaystyle Y_n(t) = \sqrt{ n } \left(\varphi^{\ast}(t) - \varphi(t) \right){},\quad \vert t\vert \leq T{},$

converges in distribution to a complex-valued Gaussian zero-mean process satisfying $Y(t) = \overline{Y(-t)}$ and

$\displaystyle \mathrm{E}\, \left(Y(t) \overline{Y(s)}\right) = \varphi(t+s)-\varphi(t) \varphi(s){},$

where $\overline{ Y(t) }$ denotes complex conjugate of .

Following Murata (2001)[20] we describe how the empirical characteristic function can be used in testing for the independence of two components in bivariate distributions.

Given the bivariate sample , $i=1,\ldots,n$ , we are interested in testing for independence of the components and . The test can be based on the following bivariate process,

$\displaystyle Z_n(t,s) = \sqrt{ n } \left( \varphi^{\ast}_{X,Y}(t+s) - \varphi^{\ast}_X(t) \varphi^{\ast}_Y(s) \right){},$

where $\varphi^{\ast}_{X,Y}(t+s) = 1/n \sum_{j=1}^n \exp(\mathrm{i} t X_j + \mathrm{i} s Y_j)$ .

Murata (2001)[20] shows that has Gaussian weak limit and that

$\displaystyle \mathrm{Var}\, Z_n(t,s) \approx \left[\varphi_X^{\ast}(2 t) - \le... ...{\ast}(2 s) - \left( \varphi_Y^{\ast}(s) \right)^2 \right]{}, \quad\mathrm{and}$
$\displaystyle \mathrm{Cov}\,\left(Z_n(t,s), \overline{ Z_n(t,s)} \right) \appro... ...{\ast}(t)\vert^2 \right) \left( 1 - \vert \varphi_Y^{\ast}(s)\vert^2 \right){},$

The statistics

$\displaystyle T(t, s) = \left(\Re Z_n(t,s) \quad \Im Z_n(t,s) \right) ~ \Sigma^{-1} ~\left(\Re Z_n(t,s) \quad \Im Z_n(t,s) \right)'$

has approximately $\chi^2$ distribution with degrees of freedom for any and finite. Symbols $\Re$ and $\Im$ stand for the real and imaginary parts of a complex number. The matrix $\Sigma$ is $2 \times 2$ matrix with entries

$\displaystyle \varsigma_{11} = \frac{1}{2} \left[\Re \mathrm{Var}\,\left(Z_n(t,s)\right) + \mathrm{Cov}\,\left(Z_n(t,s), \overline{ Z_n(t,s)} \right)\right]$
$\displaystyle \varsigma_{12} = \varsigma_{21} = \frac{1}{2} \Im \mathrm{Var}\,(Z_n(t,s)){}, \quad\mathrm{and}$
$\displaystyle \varsigma_{22} = \frac{1}{2} \left[ - \Re \mathrm{Var}\,\left(Z_n... ...)\right) + \mathrm{Cov}\,\left(Z_n(t,s), \overline{ Z_n(t,s)} \right)\right]{}.$

Any fixed pair gives a valid test, and in the numerical example we selected and for calculational convenience.

**Figure 7.3:** (a) Histogram of observed statistics with theoretical $\chi _2^2$ distribution; (b) -values of the test when components are independent; and (c) -values if the test when the second component is a mixture of an independent sample and 3% of the first component
$\includegraphics[width=3.57cm]{text/2-7/figurema.eps}$ (a) $\includegraphics[width=3.5cm]{text/2-7/figuremb.eps}$ (b) $\includegraphics[width=3.55cm]{text/2-7/figuremc.eps}$ (c)

We generated two independent components from the Beta() distribution of size and found statistics and corresponding -values times. Figure 7.3a,b depicts histograms of statistics and values based on simulations. Since the generated components and are independent, the histogram for agrees with asymptotic $\chi^2_2$ distribution, and of course, the -values are uniform on . In Fig. 7.3c we show the -values when the components and are not independent. Using two independent Beta() components and , the second component is constructed as . Notice that for majority of simulational runs the independence hypothesis is rejected, i.e., the -values cluster around 0.