13.2 $\chi^2$ Decomposition

An alternative way of measuring the association between the row and column categories is a decomposition of the value of the $\chi^2$-test statistic. The well known $\chi^2$-test for independence in a two-dimensional contingency table consists of two steps. First the expected value of each cell of the table is estimated under the hypothesis of independence. Second, the corresponding observed values are compared to the expected values using the statistic

\begin{displaymath}
t= \sum_{i=1}^n \sum_{j=1}^p (x_{ij} - E_{ij})^2/E_{ij},
\end{displaymath} (13.3)

where $x_{ij}$ is the observed frequency in cell $(i,j)$ and $E_{ij}$ is the corresponding estimated expected value under the assumption of independence, i.e.,
\begin{displaymath}
E_{ij} = \frac{x_{i \bullet}\, x_{\bullet j}}{x_{\bullet \bullet}}.
\end{displaymath} (13.4)

Here $x_{\bullet \bullet} = \sum_{i=1}^n x_{i \bullet}$. Under the hypothesis of independence, $t$ has a $\chi^2_{(n-1)(p-1)}$ distribution. In the industrial location example introduced above the value of $t=6.26$ is almost significant at the 5% level. It is therefore worth investigating the special reasons for departure from independence.

The method of $\chi^2$ decomposition consists of finding the SVD of the matrix ${\cal C} \; (n \times p)$ with elements

\begin{displaymath}
c_{ij} = (x_{ij} - E_{ij})/E_{ij}^{1/2}.
\end{displaymath} (13.5)

The elements $c_{ij}$ may be viewed as measuring the (weighted) departure between the observed $x_{ij}$ and the theoretical values $E_{ij}$ under independence. This leads to the factorial tools of Chapter 8 which describe the rows and the columns of $\data{C}$.

For simplification define the matrics $\data{A}\, (n \times n)$ and $\data{B} \, (p \times p)$ as

\begin{displaymath}
\data{A} = \mathop{\hbox{diag}}(x_{i \bullet})\textrm{ and }\data{B} = \mathop{\hbox{diag}}(x_{\bullet j}).
\end{displaymath} (13.6)

These matrices provide the marginal row frequencies $a (n \times 1)$ and the marginal column frequencies $b (p \times 1)$:
\begin{displaymath}
a = \data{A}1_n\textrm{ and } b= \data{B}1_p.
\end{displaymath} (13.7)

It is easy to verify that
\begin{displaymath}
\data{C} \sqrt{b} = 0\textrm{ and } \data{C}^{\top} \sqrt{a} =0,
\end{displaymath} (13.8)

where the square root of the vector is taken element by element and $R=\mathop{\rm {rank}}(\data{C}) \le \min \{ (n-1),(p-1) \} $. From (8.14) of Chapter 8, the SVD of $\data{C}$ yields
\begin{displaymath}
{\cal C} = \Gamma \Lambda \Delta^{\top},
\end{displaymath} (13.9)

where $\Gamma$ contains the eigenvectors of ${\cal CC}^{\top}$, $\Delta$ the eigenvectors of ${\cal C}^{\top}{\cal C}$ and
$\Lambda = \mathop{\hbox{diag}}(\lambda_{1}^{1/2}, \ldots, \lambda_R^{1/2})$ with $\lambda_1 \ge \lambda_2 \ge \ldots \ge \lambda_R$ (the eigenvalues of ${\cal CC}^{\top}$). Equation (13.9) implies that
\begin{displaymath}
c_{ij} = \sum_{k=1}^R \lambda_{k}^{1/2} \gamma_{ik} \delta_{jk}.
\end{displaymath} (13.10)

Note that (13.3) can be rewritten as
\begin{displaymath}
\mathop{\hbox{tr}}({\cal CC}^{\top}) = \sum_{k=1}^R \lambda_{k} = \sum_{i=1}^n
\sum_{j=1}^p c_{ij}^2 = t.
\end{displaymath} (13.11)

This relation shows that the SVD of $\data{C}$ decomposes the total $\chi^2$ value rather than, as in Chapter 8, the total variance.

The duality relations between the row and the column space (8.11) are now for $k=1, \ldots, R$ given by

\begin{displaymath}
\begin{array}{l} \delta_{k} = \frac{1}{\sqrt{\lambda_{k}}}
...
...\frac{1}{\sqrt{\lambda_{k}}} \data{C} \delta_{k}.
\end{array} \end{displaymath} (13.12)

The projections of the rows and the columns of $\data{C}$ are given by
\begin{displaymath}
\begin{array}{l} \data{C} \delta_{k} = \sqrt{\lambda_{k}} \g...
...{\top} \gamma_{k} = \sqrt{\lambda_{k}} \delta_{k}. \end{array} \end{displaymath} (13.13)

Note that the eigenvectors satisfy
\begin{displaymath}
\delta^{T}_k \sqrt{b} =0, \quad \gamma^{T}_k \sqrt{a} =0.
\end{displaymath} (13.14)

From (13.10) we see that the eigenvectors $\delta_k$ and $\gamma_k$ are the objects of interest when analyzing the correspondence between the rows and the columns. Suppose that the first eigenvalue in (13.10) is dominant so that
\begin{displaymath}
c_{ij} \approx \lambda_{1}^{1/2} \gamma_{i1} \delta_{j1}.
\end{displaymath} (13.15)

In this case when the coordinates $\gamma_{i1}$ and $\delta_{j1}$ are both large (with the same sign) relative to the other coordinates, then $c_{ij}$ will be large as well, indicating a positive association between the $i$-th row and the $j$-th column category of the contingency table. If $\gamma_{i1}$ and $\delta_{j1}$ were both large with opposite signs, then there would be a negative association between the $i$-th row and $j$-th column.

In many applications, the first two eigenvalues, $\lambda_1$ and $\lambda_2$, dominate and the percentage of the total $\chi^2$ explained by the eigenvectors $\gamma_1$ and $\gamma_2$ and $\delta_1$ and $\delta_2$ is large. In this case (13.13) and $(\gamma_1, \gamma_2)$ can be used to obtain a graphical display of the $n$ rows of the table ( $(\delta_1, \delta_2)$ play a similar role for the $p$ columns of the table). The interpretation of the proximity between row and column points will be interpreted as above with respect to (13.10).

In correspondence analysis, we use the projections of weighted rows of $\data{C}$ and the projections of weighted columns of $\data{C}$ for graphical displays. Let $r_k (n \times 1)$ be the projections of $\data{A}^{-1/2} \data{C}$ on $\delta_k$ and $s_k (p \times 1)$ be the projections of $\data{B}^{-1/2}
\data{C}^{\top}$ on $\gamma_k$ ($k=1,\dots,R$):

\begin{displaymath}
\begin{array}{l} r_{k} = \data{A}^{-1/2} \data{C} \delta_{k}...
...} =
\sqrt{\lambda_k} \data{B}^{-{1}/{2}}\delta_k.
\end{array} \end{displaymath} (13.16)

These vectors have the property that
\begin{displaymath}
\begin{array}{l} r_{k}^{\top} a = 0, \\
s_{k}^{\top} b = 0. \end{array} \end{displaymath} (13.17)

The obtained projections on each axis $k=1, \ldots, R$ are centered at zero with the natural weights given by $a$ (the marginal frequencies of the rows of $\data{X}$) for the row coordinates $r_k$ and by $b$ (the marginal frequencies of the columns of $\data{X}$) for the column coordinates $s_k$ (compare this to expression (13.14)). As a result, the origin is the center of gravity for all of the representations. We also know from (13.16) and the SVD of ${\data{C}}$ that
\begin{displaymath}
\begin{array}{l}
r_k^{\top} \data{A} r_k = \lambda_k, \\
s_k^{\top} \data{B} s_k = \lambda_k. \end{array}\end{displaymath} (13.18)

From the duality relation between $\delta_k$ and $\gamma_k$ (see (13.12)) we obtain

\begin{displaymath}
\begin{array}{l} r_{k} = \frac{1}{\sqrt{\lambda_{k}}} \data{...
...a{B}^{-1/2} \data{C}^{\top}
\data{A}^{1/2} r_{k}, \end{array} \end{displaymath} (13.19)

which can be simplified to
\begin{displaymath}
\begin{array}{l} r_{k} = \sqrt{\frac{x_{\bullet \bullet}}{\l...
...lambda_{k}}} \data{B}^{-1}
\data{X}^{\top} r_{k}. \end{array} \end{displaymath} (13.20)

These vectors satisfy the relations (13.1) and (13.2) for each $k=1, \ldots, R$ simultaneously.

As in Chapter 8, the vectors $r_k$ and $s_k$ are referred to as factors (row factor and column factor respectively). They have the following means and variances:

\begin{displaymath}
\begin{array}{l}
\overline{r}_k = \frac{1}{x_{\bullet\bullet...
..._k = \frac{1}{x_{\bullet\bullet}} s_k^{\top} b = 0,
\end{array}\end{displaymath} (13.21)

and
\begin{displaymath}
\begin{array}{l}
Var(r_k)=\frac{1}{x_{\bullet\bullet}}\sum^n...
...et\bullet}}
= \frac{\lambda_k}{x_{\bullet\bullet}}.
\end{array}\end{displaymath} (13.22)

Hence, ${\lambda_k}/{\sum^j_{k=1} \lambda_j}$, which is the part of the $k$-th factor in the decomposition of the $\chi^2$ statistic $t$, may also be interpreted as the proportion of the variance explained by the factor $k$. The proportions
\begin{displaymath}
C_a(i,r_k) = \frac{x_{i\bullet}r_{ki}^2}{\lambda_k},\textrm{ for }
i=1, \ldots ,n, \ k=1,\dots,R
\end{displaymath} (13.23)

are called the absolute contributions of row $i$ to the variance of the factor $r_k$. They show which row categories are most important in the dispersion of the $k$-th row factors. Similarly, the proportions
\begin{displaymath}
C_a(j,s_k) = \frac{x_{\bullet j}s_{kj}^2}{\lambda_k},\textrm{ for }
j=1, \ldots ,p, \ k=1,\dots,R
\end{displaymath} (13.24)

are called the absolute contributions of column $j$ to the variance of the column factor $s_k$. These absolute contributions may help to interpret the graph obtained by correspondence analysis.