13.2
Decomposition
An alternative way of measuring the association between the row and
column categories is a decomposition of the value of the
-test statistic.
The well known
-test for independence in a two-dimensional
contingency table consists of two steps. First the expected value
of each cell of the table is estimated under the hypothesis of independence.
Second, the corresponding observed values are compared to the
expected values using the statistic
 |
(13.3) |
where
is the observed frequency in cell
and
is the corresponding estimated expected value under the
assumption of independence, i.e.,
 |
(13.4) |
Here
. Under the
hypothesis of independence,
has a
distribution. In the industrial location example
introduced above the value of
is almost significant at the 5% level.
It is therefore worth investigating the special reasons for departure from
independence.
The method of
decomposition consists of finding the SVD of the matrix
with elements
 |
(13.5) |
The elements
may be viewed as measuring the (weighted)
departure between the observed
and the theoretical values
under independence. This leads to the factorial
tools of Chapter 8 which describe the rows and the columns of
.
For simplification define the matrics
and
as
 |
(13.6) |
These matrices provide the marginal row frequencies
and the marginal column frequencies
:
 |
(13.7) |
It is easy to verify that
 |
(13.8) |
where the square root of the vector is taken element by
element and
.
From (8.14) of Chapter 8, the SVD of
yields
 |
(13.9) |
where
contains the eigenvectors of
,
the eigenvectors of
and
with
(the eigenvalues of
). Equation (13.9) implies that
 |
(13.10) |
Note that (13.3) can be rewritten as
 |
(13.11) |
This relation shows that the SVD of
decomposes the total
value rather than, as in Chapter 8, the total variance.
The duality relations between the row and the column space (8.11)
are now for
given by
 |
(13.12) |
The projections of the rows and the columns of
are given by
 |
(13.13) |
Note that the eigenvectors satisfy
 |
(13.14) |
From (13.10) we see that the eigenvectors
and
are the objects of interest when analyzing
the correspondence between the rows and the columns.
Suppose that the first eigenvalue in
(13.10) is dominant so that
 |
(13.15) |
In this case when the coordinates
and
are both large (with the same sign) relative to the other coordinates,
then
will be large as well, indicating a positive association
between the
-th row and the
-th column category
of the contingency table.
If
and
were both large with opposite signs, then
there would be a negative association between the
-th row and
-th column.
In many applications, the first two eigenvalues,
and
,
dominate and the percentage of the total
explained by the
eigenvectors
and
and
and
is large.
In this case (13.13) and
can be used to obtain
a graphical display of the
rows of the table (
play a similar role for the
columns of the table).
The interpretation of the proximity between row and column points
will be interpreted as above with respect to (13.10).
In correspondence analysis, we use the projections of weighted rows
of
and the projections of weighted columns of
for graphical displays.
Let
be the projections of
on
and
be the projections of
on
(
):
 |
(13.16) |
These vectors have the property that
 |
(13.17) |
The obtained projections on each axis
are
centered at zero with the natural weights given by
(the marginal
frequencies of the rows of
) for the row coordinates
and
by
(the marginal frequencies of the columns of
) for the
column coordinates
(compare this to expression (13.14)).
As a result, the origin is the center of gravity for all
of the representations. We also know from (13.16)
and the SVD of
that
 |
(13.18) |
From the duality relation between
and
(see (13.12)) we obtain
 |
(13.19) |
which can be simplified to
 |
(13.20) |
These vectors satisfy the relations (13.1) and (13.2)
for each
simultaneously.
As in Chapter 8, the vectors
and
are
referred to as factors (row factor and column factor respectively).
They have the following means and variances:
 |
(13.21) |
and
 |
(13.22) |
Hence,
, which is the part
of the
-th factor in the decomposition of the
statistic
,
may also be interpreted as the proportion of the variance explained
by the factor
.
The proportions
 |
(13.23) |
are called the absolute contributions of
row
to the variance of the factor
. They show which row categories
are most important in the dispersion of the
-th row factors.
Similarly, the proportions
 |
(13.24) |
are called the absolute contributions of column
to the
variance of the column factor
. These absolute contributions may help
to interpret the graph obtained by correspondence analysis.