In contingency table,
the data are classified according to
each of two characteristics.
The attributes on each characteristic are represented by the row
and the column categories.
We will denote by the number of individuals with the
-th row and
-th column attributes. The contingency table
itself is the
matrix containing the elements
.
Total variation in the contingency table
is measured by departure from independence, i.e., more
precisely, by the statistic
The statistic which measures the departure of
independence can be rewritten as
The CA itself consists of finding the singular value
decomposition
(SVD) of the matrix
. In this way,
we obtain approximations of the matrix
by
matrices of lower rank:
The vector
, defines the coordinates of the
rows corresponding to the
-th factor. Similarily, the
vector
defines the coordinates of columns
corresponding to the
-th factor.
A set of coordinates for row (resp. column) items, where
is hierarchically constructed via singular value
decomposition.Thus the construction is similar to the PCA,
however with a different matrix norm in order to take
into account the specific frequency nature of the data.
For the sake of simplicity, the vector of first row coordinates is
called the first factor (as well as the vector of the first coordinates
for columns), and so on up to the -th factor.