14.1 Most Interesting Linear Combination

The associations between two sets of variables may be identified and quantified by canonical correlation analysis. The technique was originally developed by Hotelling (1935) who analyzed how arithmetic speed and arithmetic power are related to reading speed and reading power. Other examples are the relation between governmental policy variables and economic performance variables and the relation between job and company characteristics.

Suppose we are given two random variables $X\in \mathbb{R}^q$ and $Y\in \mathbb{R}^p$. The idea is to find an index describing a (possible) link between $X$ and $Y$. Canonical correlation analysis (CCA) is based on linear indices, i.e., linear combinations

\begin{displaymath}a^{\top}X\quad \textrm{and} \quad b^{\top}Y\end{displaymath}

of the random variables. Canonical correlation analysis searches for vectors $a$ and $b$ such that the relation of the two indices $a^{\top}x$ and $b^{\top}y$ is quantified in some interpretable way. More precisely, one is looking for the ``most interesting'' projections $a$ and $b$ in the sense that they maximize the correlation
\begin{displaymath}
\rho (a,b)=\rho_{a^{\top}X\,b^{\top}Y}
\end{displaymath} (14.1)

between the two indices.

Let us consider the correlation $\rho (a,b)$ between the two projections in more detail. Suppose that

\begin{displaymath}{X \choose Y} \sim \left(\ {\mu \choose \nu}\ ,\
\left({\Si...
...igma_{YX}}{\Sigma_{XY} \atop
\Sigma_{YY}} \right)\ \right)\, \end{displaymath}

where the sub-matrices of this covariance structure are given by

\begin{eqnarray*}
\Var(X)&=&\Sigma _{XX}\ (q\times q)\cr
\Var(Y)&=&\Sigma _{YY}\...
...\nu )^{\top}=\Sigma _{XY}=\Sigma ^{\top}_{YX}\quad
(q\times p).
\end{eqnarray*}



Using (3.7) and (4.26),
\begin{displaymath}
\rho(a,b) =\frac{a^{\top}\Sigma _{XY}b }{(a^{\top}\Sigma
_{XX}a)^{1/2}\;(b^{\top}\Sigma _{YY}b)^{1/2} }\ \cdotp
\end{displaymath} (14.2)

Therefore, $\rho (ca,b) = \rho (a,b)$ for any $c \in \mathbb{R}^+$. Given the invariance of scale we may rescale projections $a$ and $b$ and thus we can equally solve

\begin{displaymath}\max\limits_{a,b} = a^{\top}\Sigma _{XY}b\end{displaymath}

under the constraints

\begin{eqnarray*}
a^{\top}\Sigma _{XX}a &=& 1 \\
b^{\top}\Sigma _{YY}b &=& 1.
\end{eqnarray*}



For this problem, define
\begin{displaymath}
\data{K}=\Sigma ^{-1/2}_{XX}\Sigma _{XY}\Sigma ^{-1/2}_{YY}.
\end{displaymath} (14.3)

Recall the singular value decomposition of $\data{K}(q\times p)$ from Theorem 2.2. The matrix $\data{K}$ may be decomposed as

\begin{displaymath}\data{K}=\Gamma \Lambda \Delta^{\top}\end{displaymath}

with
$\displaystyle \Gamma$ $\textstyle =$ $\displaystyle (\gamma_{1},\ldots ,\gamma_{k})$  
$\displaystyle \Delta$ $\textstyle =$ $\displaystyle (\delta_{1},\ldots ,\delta_{k})$ (14.4)
$\displaystyle \Lambda$ $\textstyle =$ $\displaystyle \mathop{\hbox{diag}}(\lambda ^{1/2}_1,\ldots ,\lambda _k^{1/2})$  

where by (14.3) and (2.15),

\begin{displaymath}k= \mathop{\rm {rank}}(\data{K}) = \mathop{\rm {rank}}(\Sigma _{XY})
= \textrm{rank}(\Sigma _{YX})\ ,\end{displaymath}

and $\lambda_1 \geq \lambda_2 \geq \ldots \lambda_k$ are the nonzero eigenvalues of $\data{N}_1=\data{K}\data{K}^{\top}$ and $\data{N}_2=\data{K}^{\top}\data{K}$ and $\gamma_{i}$ and $\delta_{j}$ are the standardized eigenvectors of $\data{N}_1$ and $\data{N}_2$ respectively.

Define now for $i=1,\ldots ,k$ the vectors

$\displaystyle a_i$ $\textstyle =$ $\displaystyle \Sigma ^{-1/2}_{XX}\gamma_{i},$ (14.5)
$\displaystyle b_i$ $\textstyle =$ $\displaystyle \Sigma ^{-1/2}_{YY}\delta_{i},$ (14.6)

which are called the canonical correlation vectors. Using these canonical correlation vectors we define the canonical correlation variables
$\displaystyle \eta_i$ $\textstyle =$ $\displaystyle a_i^{\top}X$ (14.7)
$\displaystyle \varphi_i$ $\textstyle =$ $\displaystyle b_i^{\top}Y.$ (14.8)

The quantities $\rho_i=\lambda_i^{1/2}$ for $i=1,\dots,k$ are called the canonical correlation coefficients.

From the properties of the singular value decomposition given in (14.4) we have

\begin{displaymath}
\mathop{\mathit{Cov}}(\eta _i,\eta _j)=a^{\top}_i\Sigma _{XX...
...}{cc} 1 & \quad i=j,\\
0&\quad i\neq j. \end{array} \right .
\end{displaymath} (14.9)

The same is true for $\mathop{\mathit{Cov}}(\varphi _i,\varphi _j)$. The following theorem tells us that the canonical correlation vectors are the solution to the maximization problem of (14.1).

THEOREM 14.1   For any given $r$, $1\leq r\leq k$, the maximum
\begin{displaymath}
C(r)=\max_{a,b} a^{\top}\Sigma _{XY}b
\end{displaymath} (14.10)

subject to

\begin{displaymath}a^{\top}\Sigma _{XX}a=1,\quad b^{\top}\Sigma _{YY}b=1\end{displaymath}

and

\begin{displaymath}
a_i^{\top}\Sigma _{XX}a=0 \textrm{ for } i=1,\ldots ,r-1
\end{displaymath}

is given by

\begin{displaymath}
C(r)=\rho_r=\lambda_r^{1/2}
\end{displaymath}

and is attained when $a=a_r$ and $b=b_r$.

PROOF:
The proof is given in three steps.

(i) Fix $a$ and maximize over $b$, i.e., solve:

\begin{displaymath}\max_b \left(a^{\top} \Sigma_{XY} b\right)^2 =
\max_b \left(b^{\top} \Sigma_{YX} a\right)\left(a^{\top} \Sigma_{XY} b\right)
\end{displaymath}

subject to $b^{\top} \Sigma_{YY} b=1$. By Theorem 2.5 the maximum is given by the largest eigenvalue of the matrix

\begin{displaymath}
\Sigma_{YY}^{-1}\Sigma_{YX} a a^{\top} \Sigma_{XY}.
\end{displaymath}

By Corollary 2.2, the only nonzero eigenvalue equals
\begin{displaymath}
a^{\top}\Sigma_{XY}\Sigma_{YY}^{-1}\Sigma_{YX}a.
\end{displaymath} (14.11)

(ii) Maximize (14.11) over $a$ subject to the constraints of the Theorem. Put $\gamma=\Sigma_{XX}^{1/2}a$ and observe that (14.11) equals

\begin{displaymath}
\gamma^{\top}\Sigma_{XX}^{-1/2}
\Sigma_{XY}\Sigma_{YY}^{-1}\...
...}^{-1/2}\gamma
=
\gamma^{\top} \data{K}^{\top}\data{K} \gamma.
\end{displaymath}

Thus, solve the equivalent problem
\begin{displaymath}
\max_\gamma \gamma^{\top}{\data{N}_1}\gamma
\end{displaymath} (14.12)

subject to $\gamma^{\top}\gamma=1$, $\gamma_i^{\top}\gamma=0$ for $i=1,\dots,r-1$.

Note that the $\gamma_i$'s are the eigenvectors of $\data{N}_1$ corresponding to its first $r-1$ largest eigenvalues. Thus, as in Theorem 9.3, the maximum in (14.12) is obtained by setting $\gamma$ equal to the eigenvector corresponding to the $r$-th largest eigenvalue, i.e., $\gamma=\gamma_r$ or equivalently $a=a_r$. This yields

\begin{displaymath}
C^2(r)
=\gamma_r^{\top} \data{N}_1 \gamma_r = \lambda_r \gamma_r^{\top} \gamma = \lambda_r.
\end{displaymath}

(iii) Show that the maximum is attained for $a=a_r$ and $b=b_r$. From the SVD of ${\data{K}}$ we conclude that ${\data{K}}\delta_r=\rho_r\gamma_r$ and hence

\begin{displaymath}
a_r^{\top}\Sigma_{XY}b_r=\gamma_r^{\top}{\data{K}}\delta_r=\rho_r\gamma_r^{\top}\gamma_r=\rho_r.
\end{displaymath}

${\Box}$

Let

\begin{displaymath}
\left(\begin{array}{c}X\\ Y\end{array} \right) \sim
\left( \...
...{XY}\\
\Sigma_{YX} & \Sigma_{YY} \end{array} \right) \right).
\end{displaymath}

The canonical correlation vectors

\begin{displaymath}a_1 = \Sigma_{XX}^{-1/2} \gamma_1,\end{displaymath}


\begin{displaymath}b_1 = \Sigma_{YY}^{-1/2} \delta_1\end{displaymath}

maximize the correlation between the canonical variables

\begin{displaymath}\eta_1 = a_1^{\top} X,\end{displaymath}


\begin{displaymath}\varphi_1 = b_1^{\top} Y.\end{displaymath}

The covariance of the canonical variables $\eta $ and $\varphi$ is given in the next theorem.

THEOREM 14.2   Let $\eta_i$ and $\varphi_i$ be the $i$-th canonical correlation variables ($i=1,\dots,k$). Define $\eta=(\eta_1,\dots,\eta_k)$ and $\varphi=(\varphi_1,\dots,\varphi_k)$. Then

\begin{displaymath}\Var\left (\begin{array}{c} \eta \\ \varphi \end{array} \righ...
...data{I}_k&\Lambda\\
\Lambda&\data{I}_k \end{array} \right )\ \end{displaymath}

with $\Lambda$ given in (14.4).

This theorem shows that the canonical correlation coefficients, $\rho_i=\lambda_i^{1/2}$, are the covariances between the canonical variables $\eta_{i}$ and $\varphi_{i}$ and that the indices $\eta_{1}=a_{1}^{\top}X$ and $\varphi_{1}=b_{1}^{\top}Y$ have the maximum covariance $\sqrt{\lambda_1}=\rho_1$.

The following theorem shows that canonical correlations are invariant w.r.t. linear transformations of the original variables.

THEOREM 14.3   Let $X^*= \data{U}^{\top}X+u$ and $Y^*=\data{V}^{\top}Y+v$ where $\data{U}$ and $\data{V}$ are nonsingular matrices. Then the canonical correlations between $X^*$ and $Y^*$ are the same as those between $X$ and $Y$. The canonical correlation vectors of $X^*$ and $Y^*$ are given by
$\displaystyle a_i^*$ $\textstyle =$ $\displaystyle \data{U}^{-1}a_i,$  
$\displaystyle b_i^*$ $\textstyle =$ $\displaystyle \data{V}^{-1}b_i.$ (14.13)

Summary
$\ast$
Canonical correlation analysis aims to identify possible links between two (sub-)sets of variables $X\in \mathbb{R}^q$ and $Y\in \mathbb{R}^p$. The idea is to find indices $a^{\top}X$ and $b^{\top}Y$ such that the correlation $\rho(a,b)=\rho_{a^{\top}X b^{\top}Y}$ is maximal.
$\ast$
The maximum correlation (under constraints) is attained by setting $a_{i}=\Sigma_{XX}^{-1/2}\gamma_{i}$ and $b_{i}=\Sigma_{YY}^{-1/2}\delta_{i}$, where $\gamma_{i}$ and $\delta_{i}$ denote the eigenvectors of $\data{K}\data{K}^{\top}$ and $\data{K}^{\top}\data{K}$, $\data{K}=\Sigma_{XX}^{-1/2}\Sigma_{XY}
\Sigma_{YY}^{-1/2}$ respectively.
$\ast$
The vectors $a_{i}$ and $b_{i}$ are called canonical correlation vectors.
$\ast$
The indices $\eta_{i}=a_{i}^{\top}X$ and $\varphi_{i}=b_{i}^{\top}Y$ are called canonical correlation variables.
$\ast$
The values $\rho_1=\sqrt{\lambda_{1}},\ldots,\rho_k=\sqrt{\lambda_{k}}$, which are the square roots of the nonzero eigenvalues of $\data{K}\data{K}^{\top}$ and $\data{K}^{\top}\data{K}$, are called the canonical correlation coefficients. The covariance between the canonical correlation variables is $\mathop{\mathit{Cov}}(\eta_{i},\varphi_{i})=\sqrt{\lambda_{i}}$, $i=1,\ldots ,k$.
$\ast$
The first canonical variables, $\eta_{1}=a_{1}^{\top}X$ and $\varphi_{1}=b_{1}^{\top}Y$, have the maximum covariance $\sqrt{\lambda_1}$.
$\ast$
Canonical correlations are invariant w.r.t. linear transformations of the original variables $X$ and $Y$.