next up previous contents index
Next: 7.2 Fourier and Related Up: 7. Transforms in Statistics Previous: 7. Transforms in Statistics


7.1 Introduction

As an ''appetizer'' we give two simple examples of use of transformations in statistics, Fisher $ z$ and Box-Cox transformations as well as the empirical Fourier-Stieltjes transform.

Example 1   Assume that we are looking for variance transformation $ Y=\vartheta(X)$, in the case where $ \mathrm{Var}\, X=\sigma^2_X(\mu)$ is a function of the mean $ \mu=\mathrm{E}\, X$. The first order Taylor expansion of $ \vartheta(X)$ about mean $ \mu$ is

$\displaystyle \vartheta(X) = \vartheta(\mu) + (X-\mu) \vartheta'(\mu) + O\left[ (X-\mu)^2 \right]\,.$    

Ignoring quadratic and higher order terms we see that

$\displaystyle \mathrm{E}\,\vartheta(X) \approx 0{}, \quad \mathrm{Var}\, \varth...
...\mu)^2 \vartheta'(\mu)\right] = \left[\vartheta'(x)\right]^2 \sigma^2_X(\mu){}.$    

If $ \mathrm{Var}\,(\vartheta(X))$ is to be $ c^2$, we obtain

$\displaystyle \left[\vartheta'(x)\right]^2 \sigma^2_X(\mu) = c^2$    

resulting in

$\displaystyle \vartheta(x) = c \int \frac{{\mathrm{d}} x}{\sigma_X(x)} \,{\mathrm{d}} x{}.$    

This is a theoretical basis for the so-called Fisher $ z$-transformation.

Let $ (X_{11},X_{21}), \ldots, (X_{1n},X_{2n})$ be a sample from bivariate normal distribution $ N_2(\mu_1, \mu_2, \sigma_1^2,
\sigma_2^2, \rho)$, and $ \bar{X}_i = 1/n \sum_{j=1}^n X_{ij}$, $ i =
1,2$.

The Pearson coefficient of linear correlation

$\displaystyle r = \frac{\sum_{i=1}^n (X_{1i} - \bar {X_1})(X_{2i}-\bar{X_2})} {...
..._1}\right)^2 \cdot \sum_{i=1}^n \left(X_{2i}- \bar{X_2}\right)^2\right]^{1/2} }$    

has a complicated distribution involving special functions, e.g., Anderson (1984, p. 113)[1]. However, it is well known that the asymptotic distribution for $ r$ is normal $ N(\rho,
\frac{(1-\rho^2)^2}{n})$. Since the variance is a function of mean,

$\displaystyle \vartheta(\rho)$ $\displaystyle = \int \frac{c \sqrt{n} }{1 - \rho^2} \mathrm{d} \rho$    
  $\displaystyle = \frac{c \sqrt{n}}{2} \int \left( \frac{1}{1-\rho} + \frac{1}{1+\rho} \right) \mathrm{d} \rho$    
  $\displaystyle = \frac{c \sqrt{n}}{2} \log \left( \frac{1+\rho}{1-\rho} \right) + k$    

is known as Fisher $ z$-transformation for the correlation coefficient (usually for $ c=1/\sqrt{n}$ and $ k = 0$). Assume that $ r$ and $ \rho$ are mapped to $ z$ and $ \zeta$ as

$\displaystyle z = \frac{1}{2} \log \left( \frac{1+r}{1-r} \right) = \mathrm{arc...
...{1}{2} \log \left( \frac{1+\rho}{1-\rho} \right) = \mathrm{arctanh}\,\, \rho{}.$    

The distribution of $ z$ is approximately normal $ N(\zeta,
1/(n-3))$ and this approximation is quite accurate when $ \rho^2/n^2$ is small and when $ n$ is as low as $ 20$. The use of Fisher $ z$-transformation is illustrated on finding the confidence intervals for $ \rho$ and testing hypotheses about $ \rho$.

Figure 7.1: (a) Simulational run of $ 10$,$ 000$ $ r$'s from the bivariate population having theorethical $ \rho =\sqrt {2}/2.$; (b) The same $ r$'s transformed to $ z$'s with the normal approximation superimposed
\includegraphics[width=5.1cm]{text/2-7/figure1a.eps}(a) \includegraphics[width=5.1cm]{text/2-7/figure1b.eps}(b)

To exemplify the above, we generated $ n=30$ pairs of normally distributed random samples with theoretical correlation $ \sqrt{2}/2$. This was done by generating two i.i.d. normal samples $ a$, and $ b$ of length $ 30$ and taking the transformation $ x_1=a+b$, $ x_2=b$. The sample correlation coefficient $ r$ is found. This was repeated $ M=10{,}000$ times. The histogram of $ 10{,}000$ sample correlation coefficients is shown in Fig. 7.1a. The histogram of $ z$-transformed $ r$'s is shown in Fig. 7.1b with superimposed normal approximation $ N(\mathrm{arctanh}\,(\sqrt{2}/2), 1/(30-3))$.

(i) For example, $ (1-\alpha)$ $ {100} \%$ confidence interval for $ \rho$ is:

$\displaystyle \left[ \tanh \left(z - \frac{\Phi^{-1}(1-\alpha/2)}{\sqrt{n-3}}\right), \tanh \left(z + \frac{\Phi^{-1}(1-\alpha/2)}{\sqrt{n-3}}\right) \right]{},$    

where $ z=\mathrm{arctanh}\,(r)$ and $ \tanh x = (\mathrm{e}^x - \mathrm{e}^{-x})/(\mathrm{e}^x +
\mathrm{e}^{-x})$ and $ \Phi $ stands for the standard normal cumulative distribution function.

If $ r=-{0.5687}$ and $ n=28$ $ z=-{0.6456}$, $ z_L=-{0.6456}
- {1.96}/{5}= -{1.0376}$ and $ z_U=-{0.6456} +
{1.96}/{5} = -{0.2536}$. In terms of $ \rho$ the $ {95}\%$ confidence interval is $ [-{0.7769}, -{0.2483}]$.

(ii) Assume that two samples of size $ n_1$ and $ n_2$, respectively, are obtained form two different bivariate normal populations. We are interested in testing $ H_0: \rho_1 = \rho_2$ against the two sided alternative. After observing $ r_1$ and $ r_2$ and transforming them to $ z_1$ and $ z_2$, we conclude that the $ p$-value of the test is $ 2
\Phi(-\vert z_1 - z_2\vert/\sqrt{ 1/(n_1 -3) + 1/(n_2 -3)})$.

Example 2   Box and Cox (1964)[4] introduced a family of transformations, indexed by real parameter $ \lambda $, applicable to positive data $ X_1, \ldots, X_n$,

$\displaystyle Y_i = \begin{cases}\frac{X_i^\lambda - 1}{\lambda}{},& \lambda \ne 0 \\ \log X_i{}, & \lambda = 0{}. \end{cases}$ (7.1)

This transformation is mostly applied to responses in linear models exhibiting non-normality and/or heteroscedasticity. For properly selected $ \lambda $, transformed data $ Y_1, \ldots, Y_n$ may look ''more normal'' and amenable to standard modeling techniques. The parameter $ \lambda $ is selected by maximizing the log-likelihood,

$\displaystyle (\lambda - 1) \sum_{i=1}^n \log X_i - \frac{n}{2} \log\left[ \frac{1}{n} \sum_{i=1}^n \left(Y_i - \bar{Y}_i\right)^2 \right]{},$ (7.2)

where $ Y_i$ are given in (7.1) and $ \bar{Y}_i =
1/n\sum_{i=1}^n Y_i$.

As an illustration, we apply the Box-Cox transformation to apparently skewed data of CEO salaries.

Forbes magazine published data on the best small firms in 1993. These were firms with annual sales of more than five and less than $ 350$ million. Firms were ranked by five-year average return on investment. One of the variables extracted is the annual salary of the chief executive officer for the first $ 60$ ranked firms (since one datum is missing, the sample size is $ 59$). Figure 7.2a shows the histogram of row data (salaries). The data show moderate skeweness to the right. Figure 7.2b gives the values of likelihood in (7.2) for different values of $ \lambda $. Note that (7.2) is maximized for $ \lambda $ approximately equal to $ {0.45}$. Figure 7.2c gives the transformed data by Box-Cox transformation with $ \lambda ={0.45}$. The histogram of transformed salaries is notably symetrized.

Figure 7.2: (a) Histogram of row data (CEO salaries); (b) Log-likelihood is maximized at $ \lambda ={0.45}$; and (c) Histogram of Box-Cox-transformed data
\includegraphics[width=3.5cm]{text/2-7/figure2a.eps}(a) \includegraphics[width=3.65cm]{text/2-7/figure2b.eps}(b) \includegraphics[width=3.5cm]{text/2-7/figure2c.eps}(c)

Example 3   As an example of transforms utilized in statistics, we provide an application of empirical Fourier-Stieltjes transform (empirical characteristic function) in testing for the independence.

The characteristic function of a probability distribution $ F$ is defined as its Fourier-Stieltjes transform,

$\displaystyle \varphi_X(t) = \mathrm{E}\, \exp(\mathrm{i} t X){},$ (7.3)

where $ \mathrm{E}$ is expectation and random variable $ X$ has distribution function $ F$. It is well known that the correspondence of characteristic functions and distribution functions is $ 1$-$ 1$, and that closeness in the domain of characteristic functions corresponds to closeness in the domain of distribution functions. In addition to uniqueness, characteristic functions are bounded. The same does not hold for moment generating functions which are Laplace transforms of distribution functions.

For a sample $ X_1, X_2, \ldots, X_n$ one defines empirical characteristic function $ \varphi^{\ast}(t)$ as

$\displaystyle \varphi^{\ast}_X(t) = \frac{1}{n} \sum_{j=1}^n \exp(i t X_j){}.$    

The result by Feuerverger and Mureika (1977)[9] establishes the large sample properties of the empirical characteristic function.

Theorem 1   For any $ T <\infty$

$\displaystyle P\left[ \lim_{n \rightarrow \infty} \sup_{\vert t\vert \leq T} \vert\varphi^{\ast}(t) - \varphi(t)\vert=0 \right]=1$    

holds. Moreover, when $ n\rightarrow \infty $, the stochastic process

$\displaystyle Y_n(t) = \sqrt{ n } \left(\varphi^{\ast}(t) - \varphi(t) \right){},\quad \vert t\vert \leq T{},$    

converges in distribution to a complex-valued Gaussian zero-mean process $ Y(t)$ satisfying $ Y(t) = \overline{Y(-t)}$ and

$\displaystyle \mathrm{E}\, \left(Y(t) \overline{Y(s)}\right) = \varphi(t+s)-\varphi(t) \varphi(s){},$    

where $ \overline{ Y(t) }$ denotes complex conjugate of $ Y(t)$.

Following Murata (2001)[20] we describe how the empirical characteristic function can be used in testing for the independence of two components in bivariate distributions.

Given the bivariate sample $ (X_i, Y_i)$, $ i=1,\ldots,n$, we are interested in testing for independence of the components $ X$ and $ Y$. The test can be based on the following bivariate process,

$\displaystyle Z_n(t,s) = \sqrt{ n } \left( \varphi^{\ast}_{X,Y}(t+s) - \varphi^{\ast}_X(t) \varphi^{\ast}_Y(s) \right){},$    

where $ \varphi^{\ast}_{X,Y}(t+s) = 1/n \sum_{j=1}^n \exp(\mathrm{i} t
X_j + \mathrm{i} s Y_j)$.

Murata (2001)[20] shows that $ Z_n(t,s)$ has Gaussian weak limit and that

$\displaystyle \mathrm{Var}\, Z_n(t,s) \approx \left[\varphi_X^{\ast}(2 t) - \le...
...{\ast}(2 s) - \left( \varphi_Y^{\ast}(s) \right)^2 \right]{}, \quad\mathrm{and}$    
$\displaystyle \mathrm{Cov}\,\left(Z_n(t,s), \overline{ Z_n(t,s)} \right) \appro...
...{\ast}(t)\vert^2 \right) \left( 1 - \vert \varphi_Y^{\ast}(s)\vert^2 \right){},$    

The statistics

$\displaystyle T(t, s) = \left(\Re Z_n(t,s) \quad \Im Z_n(t,s) \right) ~ \Sigma^{-1} ~\left(\Re Z_n(t,s) \quad \Im Z_n(t,s) \right)'$    

has approximately $ \chi^2$ distribution with $ 2$ degrees of freedom for any $ t$ and $ s$ finite. Symbols $ \Re$ and $ \Im$ stand for the real and imaginary parts of a complex number. The matrix $ \Sigma$ is $ 2 \times 2$ matrix with entries

$\displaystyle \varsigma_{11} = \frac{1}{2} \left[\Re \mathrm{Var}\,\left(Z_n(t,s)\right) + \mathrm{Cov}\,\left(Z_n(t,s), \overline{ Z_n(t,s)} \right)\right]$    
$\displaystyle \varsigma_{12} = \varsigma_{21} = \frac{1}{2} \Im \mathrm{Var}\,(Z_n(t,s)){}, \quad\mathrm{and}$    
$\displaystyle \varsigma_{22} = \frac{1}{2} \left[ - \Re \mathrm{Var}\,\left(Z_n...
...)\right) + \mathrm{Cov}\,\left(Z_n(t,s), \overline{ Z_n(t,s)} \right)\right]{}.$    

Any fixed pair $ t,s$ gives a valid test, and in the numerical example we selected $ t=1$ and $ s=1$ for calculational convenience.

Figure 7.3: (a) Histogram of observed $ T$ statistics with theoretical $ \chi _2^2$ distribution; (b) $ p$-values of the test when components are independent; and (c) $ p$-values if the test when the second component is a mixture of an independent sample and 3% of the first component
\includegraphics[width=3.57cm]{text/2-7/figurema.eps}(a) \includegraphics[width=3.5cm]{text/2-7/figuremb.eps}(b) \includegraphics[width=3.55cm]{text/2-7/figuremc.eps}(c)

We generated two independent components from the Beta($ 1,2$) distribution of size $ n=2000$ and found $ T$ statistics and corresponding $ p$-values $ M=2000$ times. Figure 7.3a,b depicts histograms of $ T$ statistics and $ p$ values based on $ {2000}$ simulations. Since the generated components $ X$ and $ Y$ are independent, the histogram for $ T$ agrees with asymptotic $ \chi^2_2$ distribution, and of course, the $ p$-values are uniform on $ [0,1]$. In Fig. 7.3c we show the $ p$-values when the components $ X$ and $ Y$ are not independent. Using two independent Beta($ 1,2$) components $ X$ and $ Y'$, the second component $ Y$ is constructed as $ Y={0.03} X + {0.97} Y'$. Notice that for majority of simulational runs the independence hypothesis is rejected, i.e., the $ p$-values cluster around 0.


next up previous contents index
Next: 7.2 Fourier and Related Up: 7. Transforms in Statistics Previous: 7. Transforms in Statistics