3.2 Correlation

The correlation between two variables $X$ and $Y$ is defined from the covariance as the following:
\begin{displaymath}
\rho _{XY} = \frac{\mathop{\mathit{Cov}}(X,Y) }{\sqrt {\mathop{\mathit{Var}}(X)\mathop{\mathit{Var}}(Y)}}\cdotp
\end{displaymath} (3.7)

The advantage of the correlation is that it is independent of the scale, i.e., changing the variables' scale of measurement does not change the value of the correlation. Therefore, the correlation is more useful as a measure of association between two random variables than the covariance. The empirical version of $\rho _{XY}$ is as follows:
\begin{displaymath}
r_{XY} = \frac{s_{XY} }{\sqrt {s_{XX}s_{YY}} }\cdotp
\end{displaymath} (3.8)

The correlation is in absolute value always less than 1. It is zero if the covariance is zero and vice-versa. For $p$-dimensional vectors $(X_1,\ldots ,X_p)^{\top}$ we have the theoretical correlation matrix

\begin{displaymath}
{\data{P}} = \left (
{\begin{array}{ccc}
\rho_{X_1X_1} &\ld...
...rho_{X_pX_1}&\ldots &\rho_{X_{p}X_{p}}
\end{array}} \right ) ,
\end{displaymath}

and its empirical version, the empirical correlation matrix which can be calculated from the observations,

\begin{displaymath}
{\data{R}}
= \left (
{\begin{array}{ccc}
r_{X_1X_1} &\ldots...
...s\\
r_{X_pX_1}&\ldots &r_{X_{p}X_{p}}
\end{array}} \right ) .
\end{displaymath}

EXAMPLE 3.3   We obtain the following correlation matrix for the genuine bank notes:
\begin{displaymath}
{\data{R}}_g=\left (
{\begin{array}{rrrrrr}
1.00&0.41&0.41&...
...\
0.03&-0.25&-0.14&-0.00&-0.25&1.00
\end{array}} \right )\ ,
\end{displaymath} (3.9)

and for the counterfeit bank notes:
\begin{displaymath}
{\data{R}}_f=\left (
{\begin{array}{rrrrrr}
1.00&0.35&0.24&-...
...06\\
0.06&-0.03&0.20&0.37&-0.06&1.00
\end{array}} \right)\ .
\end{displaymath} (3.10)

As noted before for $\mathop{\mathit{Cov}}(X_4,X_5)$, the correlation between $X_4$ (distance of the frame to the lower border) and $X_5$ (distance of the frame to the upper border) is negative. This is natural, since the covariance and correlation always have the same sign (see also Exercise 3.17).

Why is the correlation an interesting statistic to study? It is related to independence of random variables, which we shall define more formally later on. For the moment we may think of independence as the fact that one variable has no influence on another.

THEOREM 3.1   If $X$ and $Y$ are independent, then $\rho(X,Y)=\Cov(X,Y)=0.$

1mm
\begin{picture}(2.00,2.00)
\par\linethickness{1.0pt}\put(0.00,0.00){\line(1,0){1...
...\line(1,-2){5.00}}
\put(5.00,4.00){\makebox(0,0)[cc]{\LARGE\bf !}}
\end{picture}
In general, the converse is not true, as the following example shows.

EXAMPLE 3.4   Consider a standard normally-distributed random variable $X$ and a random variable $Y=X^2$, which is surely not independent of $X$. Here we have

\begin{displaymath}\Cov (X,Y) = E(XY) - E(X)E(Y) = E(X^3) = 0 \end{displaymath}

(because $E(X)=0$ and $E(X^2)=1$). Therefore $\rho (X,Y)=0$, as well. This example also shows that correlations and covariances measure only linear dependence. The quadratic dependence of $Y=X^2$ on $X$ is not reflected by these measures of dependence.

REMARK 3.1   For two normal random variables, the converse of Theorem 3.1 is true: zero covariance for two normally-distributed random variables implies independence. This will be shown later in Corollary 5.2.

Theorem 3.1 enables us to check for independence between the components of a bivariate normal random variable. That is, we can use the correlation and test whether it is zero. The distribution of $r_{XY}$ for an arbitrary $(X,Y)$ is unfortunately complicated. The distribution of $r_{XY}$ will be more accessible if $(X,Y)$ are jointly normal (see Chapter 5). If we transform the correlation by Fisher's $Z$-transformation,

\begin{displaymath}
W = \frac{1 }{2 }\log\left (\frac{1+r_{XY} }
{1-r_{XY} }\right ),
\end{displaymath} (3.11)

we obtain a variable that has a more accessible distribution. Under the hypothesis that $\rho = 0$, $W$ has an asymptotic normal distribution. Approximations of the expectation and variance of $W$ are given by the following:
\begin{displaymath}
\begin{array}{rcl}
E(W)&\approx& \frac{1}{2 }\log \left (\fr...
...p{\mathit{Var}}(W)&\approx& \frac{1}{(n-3)}\cdotp
\end{array} \end{displaymath} (3.12)

The distribution is given in Theorem 3.2.

THEOREM 3.2  
\begin{displaymath}
Z = \frac{W-E(W) }{\sqrt {\mathop{\mathit{Var}}(W)} } \stackrel{\cal L}{\longrightarrow}
N(0,1).
\end{displaymath} (3.13)

The symbol `` $\stackrel{\cal L}{\longrightarrow}$'' denotes convergence in distribution, which will be explained in more detail in Chapter 4.

Theorem 3.2 allows us to test different hypotheses on correlation. We can fix the level of significance $\alpha$ (the probability of rejecting a true hypothesis) and reject the hypothesis if the difference between the hypothetical value and the calculated value of $Z$ is greater than the corresponding critical value of the normal distribution. The following example illustrates the procedure.

EXAMPLE 3.5   Let's study the correlation between mileage ($X_2$) and weight ($X_8$) for the car data set (B.3) where $n=74$. We have $r_{X_2X_8}=-0.823$. Our conclusions from the boxplot in Figure 1.3 (``Japanese cars generally have better mileage than the others'') needs to be revised. From Figure 3.3 and $r_{X_{2}X_{8}}$, we can see that mileage is highly correlated with weight, and that the Japanese cars in the sample are in fact all lighter than the others!

If we want to know whether $\rho_{X_{2}X_{8}}$ is significantly different from $\rho _0=0$, we apply Fisher's $Z$-transform (3.11). This gives us

\begin{displaymath}w = \frac{1}{2} \log \left (\frac{1+r_{X_{2}X_{8}}}
{1-r_{X_{...
...d } \quad
z = \frac{-1.166-0}{\sqrt {\frac{1 }{71 }} }= -9.825,\end{displaymath}

i.e., a highly significant value to reject the hypothesis that $\rho = 0$ (the 2.5% and 97.5% quantiles of the normal distribution are $-1.96$ and $1.96$, respectively). If we want to test the hypothesis that, say, $\rho _0=-0.75$, we obtain:

\begin{displaymath}z = \frac{-1.166-(-0.973)}{\sqrt {\frac{1}{71}} }= -1.627.\end{displaymath}

This is a nonsignificant value at the $\alpha = 0.05$ level for $z$ since it is between the critical values at the 5% significance level (i.e., $-1.96 < z < 1.96$).

Figure 3.3: Mileage ($X_{2}$) vs. weight ($X_{8}$) of U.S. (star), European (plus signs) and Japanese (circle) cars. 9070 MVAscacar.xpl
\includegraphics[width=1\defpicwidth]{scacar.ps}

EXAMPLE 3.6   Let us consider again the pullovers data set from example 3.2. Consider the correlation between the presence of the sales assistants ($X_{4}$) vs. the number of sold pullovers ($X_{1}$) (see Figure 3.4). Here we compute the correlation as

\begin{displaymath}r_{X_{1}X_{4}}=0.633.\end{displaymath}

Figure 3.4: Hours of sales assistants ($X_4$) vs. sales ($X_1$) of pullovers. 9074 MVAscapull2.xpl
\includegraphics[width=1\defpicwidth]{scapull2.ps}

The $Z$-transform of this value is

\begin{displaymath}
w =\frac{1 }{2 }\log_e\left(
\frac{1+r_{X_{1}X_{4}}}{1-r_{X_{1}X_{4}}}\right)= 0.746.
\end{displaymath} (3.14)

The sample size is $n=10$, so for the hypothesis $\rho_{X_{1}X_{4}}=0$, the statistic to consider is:
\begin{displaymath}
z = \sqrt {7}(0.746-0) = 1.974
\end{displaymath} (3.15)

which is just statistically significant at the $5\%$ level (i.e., 1.974 is just a little larger than 1.96).

REMARK 3.2   The normalizing and variance stabilizing properties of $W$ are asymptotic. In addition the use of $W$ in small samples (for $n\leq25$) is improved by Hotelling's transform (Hotelling; 1953):

\begin{displaymath}W^*=W-\frac{3W+\tanh(W)}{4(n-1)} \quad\textrm{ with } \quad
Var(W^*)=\frac{1}{n-1}.\end{displaymath}

The transformed variable $W^*$ is asymptotically distributed as a normal distribution.

EXAMPLE 3.7   From the preceding remark, we obtain $w^*=0.6663$ and $\sqrt{10-1} w^*=1.9989$ for the preceding Example 3.6. This value is significant at the $5\%$ level.

REMARK 3.3   Note that the Fisher's Z-transform is the inverse of the hyperbolic tangent function: $W=\tanh^{-1}(r_{XY})$; equivalently $r_{XY}= \tanh(W) = \frac{e^{2W}-1}{e^{2W}+1}$.

REMARK 3.4   Under the assumptions of normality of $X$ and $Y$, we may test their independence ($\rho_{XY}=0$) using the exact $t$-distribution of the statistic

\begin{displaymath}
T=r_{XY}\sqrt{\frac{n-2}{1-r^2_{XY}}}\stackrel{\rho_{XY}=0}{\sim} t_{n-2}.
\end{displaymath}

Setting the probability of the first error type to $\alpha$, we reject the null hypothesis $\rho_{XY}=0$ if $\vert T\vert\geq t_{1-\alpha/2;n-2}$.

Summary
$\ast$
The correlation is a standardized measure of dependence.
$\ast$
The absolute value of the correlation is always less than one.
$\ast$
Correlation measures only linear dependence.
$\ast$
There are nonlinear dependencies that have zero correlation.
$\ast$
Zero correlation does not imply independence.
$\ast$
Independence implies zero correlation.
$\ast$
Negative correlation corresponds to downward-sloping scatterplots.
$\ast$
Positive correlation corresponds to upward-sloping scatterplots.
$\ast$
Fisher's Z-transform helps us in testing hypotheses on correlation.
$\ast$
For small samples, Fisher's Z-transform can be improved by the transformation $W^*=W-\frac{3W+\tanh(W)}{4(n-1)}$.