13.3 Correspondence Analysis in Practice
The graphical representations on the axes
of the
rows and of the
columns of
are provided by the
elements of
and
. Typically, two-dimensional displays are
often satisfactory if the cumulated percentage of variance explained
by the first two factors,
, is sufficiently large.
The interpretation of the graphs may be summarized as follows:
- -
- The proximity of two rows (two columns) indicates a similar
profile in these two rows (two columns), where ``profile''
referrs to the conditional frequency distribution of a row (column);
those two rows (columns) are almost proportional.
The opposite interpretation applies when
the two rows (two columns) are far apart.
- -
- The proximity of a particular row to a particular column indicates
that this row (column) has a particularly important weight in this
column (row). In contrast to this, a row that is quite distant from
a particular column indicates that there are
almost no observations in this column for this row
(and vice versa). Of course, as mentioned above, these conclusions are
particularly true when the points are far away from
.
- -
- The origin is the average of the factors
and
. Hence, a
particular point (row or column) projected close to the origin
indicates an average profile.
- -
- The absolute contributions are used to evaluate the weight of each
row (column) in the variances of the factors.
- -
- All the interpretations outlined above must be carried out in view of the
quality of the graphical representation which is evaluated, as in
PCA, using the cumulated percentage of variance.
REMARK 13.1
Note that correspondence analysis can also be applied to more general

tables

which in a ``strict sense''
are not contingency tables.
As long as statistical (or natural) meaning
can be given to sums over rows and columns, Remark 13.1 holds.
This implies, in particular,
that all of the variables are measured in the same units. In that case,
constitutes the total frequency
of the observed phenomenon, and
is shared between individuals (
rows) and between variables (
columns).
Representations of the rows and columns of
,
and
,
have the basic property (13.19) and show which
variables have important weights for each individual and vice versa.
This type of analysis is used as an alternative to PCA.
PCA is mainly concerned with covariances
and correlations, whereas correspondence analysis analyzes a more
general kind of association.
(See Exercises 13.3 and 13.11.)
EXAMPLE 13.3
A survey of Belgium citizens who regularly read a newspaper was
conducted in the 1980's. They were asked where they lived.
The possible answers were 10 regions: 7 provinces
(Antwerp, Western Flanders, Eastern Flanders, Hainant, Liège,
Limbourg, Luxembourg) and 3 regions around Brussels
(Flemish-Brabant, Wallon-Brabant and the city of Brussels).
They were also asked what kind of newspapers they read on
a regular basis. There were 15 possible answers split up into 3 classes:
Flemish newspapers (label begins with the letter

), French newspapers
(label begins with

) and both languages together
(label begins with

). The data set is given in Table
B.9.
The eigenvalues of the factorial correspondence analysis
are given in Table
13.1.
Table:
Eigenvalues and percentages of the variance
(Example 13.3) .
 |
percentage of variance |
cumulated percentage |
183.40 |
0.653 |
0.653 |
43.75 |
0.156 |
0.809 |
25.21 |
0.090 |
0.898 |
11.74 |
0.042 |
0.940 |
8.04 |
0.029 |
0.969 |
4.68 |
0.017 |
0.985 |
2.13 |
0.008 |
0.993 |
1.20 |
0.004 |
0.997 |
0.82 |
0.003 |
1.000 |
0.00 |
0.000 |
1.000 |
|
Two-dimensional representations will be quite satisfactory
since the first two eigenvalues account for 81% of the variance.
Figure 13.1 shows the projections of the rows (the 15 newspapers)
and of the columns (the 10 regions).
Figure 13.1:
Projection of rows (the 15 newspapers) and columns (the 10 regions)
MVAcorrjourn.xpl
|
As expected, there is a high association between the regions and the type
of newspapers which is read. In particular,
(Gazet van Antwerp) is
almost exclusively read in the province of Antwerp
(this is an extreme point in the graph).
The points on the left all belong to Flanders, whereas those on the right all
belong to Wallonia.
Notice that the Wallon-Brabant and the Flemish-Brabant are not far from
Brussels. Brussels is close to the center (average)
and also close to the bilingual newspapers. It is shifted a little to the right
of the origin due to the majority of French speaking people in the area.
Table 13.2:
Absolute contributions of row factors
.
|
 |
 |
 |
 |
0.0563 |
0.0008 |
0.0036 |
 |
0.1555 |
0.5567 |
0.0067 |
 |
0.0244 |
0.1179 |
0.0266 |
 |
0.1352 |
0.0952 |
0.0164 |
 |
0.0253 |
0.1193 |
0.0013 |
 |
0.0314 |
0.0183 |
0.0597 |
 |
0.0585 |
0.0162 |
0.0122 |
 |
0.1086 |
0.0024 |
0.0656 |
 |
0.1001 |
0.0024 |
0.6376 |
 |
0.0029 |
0.0055 |
0.0187 |
 |
0.0236 |
0.0278 |
0.0237 |
 |
0.0006 |
0.0090 |
0.0064 |
 |
0.1000 |
0.0038 |
0.0047 |
 |
0.0966 |
0.0059 |
0.0269 |
 |
0.0810 |
0.0188 |
0.0899 |
Total |
1.0000 |
1.0000 |
1.0000 |
|
The absolute contributions of the first 3 factors are listed in
Tables 13.2 and 13.3.
The row factors
are in Table 13.2 and
the column factors
are in Table 13.3.
Table 13.3:
Absolute contributions of column factors
.
|
 |
 |
 |
brw |
0.0887 |
0.0210 |
0.2860 |
bxl |
0.1259 |
0.0010 |
0.0960 |
anv |
0.2999 |
0.4349 |
0.0029 |
brf |
0.0064 |
0.2370 |
0.0090 |
foc |
0.0729 |
0.1409 |
0.0033 |
for |
0.0998 |
0.0023 |
0.0079 |
hai |
0.1046 |
0.0012 |
0.3141 |
lig |
0.1168 |
0.0355 |
0.1025 |
lim |
0.0562 |
0.1162 |
0.0027 |
lux |
0.0288 |
0.0101 |
0.1761 |
Total |
1.0000 |
1.0000 |
1.0000 |
|
They show, for instance, the important role of Antwerp and
the newspaper
in determining the variance of both factors.
Clearly, the first axis expresses
linguistic differences between the 3 parts of Belgium.
The second axis shows a larger
dispersion between the Flemish region than the French speaking regions.
Note also that the 3-rd axis shows an important role of the category
``
'' (other French newspapers) with the Wallon-Brabant ``brw'' and the
Hainant ``hai'' showing the most important contributions. The coordinate
of ``
'' on this axis is negative (not shown here) so are the coordinates
of ``brw'' and ``hai''. Apparently, these two regions also seem to feature
a greater proportion of readers of more local newspapers.
Table:
Eigenvalues and percentages of explained variance (including Corsica).
eigenvalues  |
percentage of variances |
cumulated percentage |
2436.2 |
0.5605 |
0.561 |
1052.4 |
0.2421 |
0.803 |
341.8 |
0.0786 |
0.881 |
229.5 |
0.0528 |
0.934 |
152.2 |
0.0350 |
0.969 |
109.1 |
0.0251 |
0.994 |
25.0 |
0.0058 |
1.000 |
0.0 |
0.0000 |
1.000 |
|
EXAMPLE 13.4
Applying correspondence analysis to the
French baccalauréat data (Table
B.8)
leads to Figure
13.2.
Excluding Corsica we obtain Figure
13.3.
The different modalities are labeled
A, ...,
H and the regions are labeled
ILDF, ...,
CORS.
The results of the correspondence analysis are given in Table
13.4
and Figure
13.2.
The first two factors explain 80 % of the total variance. It is clear
from Figure 13.2 that Corsica (in the upper left) is an outlier.
The analysis is therefore redone without Corsica and the results are given in
Table 13.5 and Figure 13.3.
Since Corsica has such a small weight in the analysis, the results
have not changed much.
Figure 13.3:
Correspondence analysis excluding Corsica.
MVAcorrbac.xpl
|
Table:
Eigenvalues and percentages of explained variance (excluding Corsica).
eigenvalues  |
percentage of variances |
cumulated percentage |
2408.6 |
0.5874 |
0.587 |
909.5 |
0.2218 |
0.809 |
318.5 |
0.0766 |
0.887 |
195.9 |
0.0478 |
0.935 |
149.3 |
0.0304 |
0.971 |
96.1 |
0.0234 |
0.994 |
22.8 |
0.0056 |
1.000 |
0.0 |
0.0000 |
1.000 |
|
The projections on the first three axes, along with their absolute
contribution to the variance of the axis, are summarized in Table 13.6
for the regions and in Table 13.7 for baccalauréats.
Table:
Coefficients and absolute contributions for regions, Example 13.4.
Region |
 |
 |
 |
 |
 |
 |
ILDF |
0.1464 |
0.0677 |
0.0157 |
0.3839 |
0.2175 |
0.0333 |
CHAM |
-0.0603 |
-0.0410 |
-0.0187 |
0.0064 |
0.0078 |
0.0047 |
PICA |
0.0323 |
-0.0258 |
-0.0318 |
0.0021 |
0.0036 |
0.0155 |
HNOR |
-0.0692 |
0.0287 |
0.1156 |
0.0096 |
0.0044 |
0.2035 |
CENT |
-0.0068 |
-0.0205 |
-0.0145 |
0.0001 |
0.0030 |
0.0043 |
BNOR |
-0.0271 |
-0.0762 |
0.0061 |
0.0014 |
0.0284 |
0.0005 |
BOUR |
-0.1921 |
0.0188 |
0.0578 |
0.0920 |
0.0023 |
0.0630 |
NOPC |
-0.1278 |
0.0863 |
-0.0570 |
0.0871 |
0.1052 |
0.1311 |
LORR |
-0.2084 |
0.0511 |
0.0467 |
0.1606 |
0.0256 |
0.0608 |
ALSA |
-0.2331 |
0.0838 |
0.0655 |
0.1283 |
0.0439 |
0.0767 |
FRAC |
-0.1304 |
-0.0368 |
-0.0444 |
0.0265 |
0.0056 |
0.0232 |
PAYL |
-0.0743 |
-0.0816 |
-0.0341 |
0.0232 |
0.0743 |
0.0370 |
BRET |
0.0158 |
0.0249 |
-0.0469 |
0.0011 |
0.0070 |
0.0708 |
PCHA |
-0.0610 |
-0.1391 |
-0.0178 |
0.0085 |
0.1171 |
0.0054 |
AQUI |
0.0368 |
-0.1183 |
0.0455 |
0.0055 |
0.1519 |
0.0643 |
MIDI |
0.0208 |
-0.0567 |
0.0138 |
0.0018 |
0.0359 |
0.0061 |
LIMO |
-0.0540 |
0.0221 |
-0.0427 |
0.0033 |
0.0014 |
0.0154 |
RHOA |
-0.0225 |
0.0273 |
-0.0385 |
0.0042 |
0.0161 |
0.0918 |
AUVE |
0.0290 |
-0.0139 |
-0.0554 |
0.0017 |
0.0010 |
0.0469 |
LARO |
0.0290 |
-0.0862 |
-0.0177 |
0.0383 |
0.0595 |
0.0072 |
PROV |
0.0469 |
-0.0717 |
0.0279 |
0.0142 |
0.0884 |
0.0383 |
|
Table:
Coefficients and absolute contributions for baccalauréats, Example 13.4.
Baccal |
 |
 |
 |
 |
 |
 |
A |
0.0447 |
-0.0679 |
0.0367 |
0.0376 |
0.2292 |
0.1916 |
B |
0.1389 |
0.0557 |
0.0011 |
0.1724 |
0.0735 |
0.0001 |
C |
0.0940 |
0.0995 |
0.0079 |
0.1198 |
0.3556 |
0.0064 |
D |
0.0227 |
-0.0495 |
-0.0530 |
0.0098 |
0.1237 |
0.4040 |
E |
-0.1932 |
0.0492 |
-0.1317 |
0.0825 |
0.0141 |
0.2900 |
F |
-0.2156 |
0.0862 |
0.0188 |
0.3793 |
0.1608 |
0.0219 |
G |
-0.1244 |
-0.0353 |
0.0279 |
0.1969 |
0.0421 |
0.0749 |
H |
-0.0945 |
0.0438 |
-0.0888 |
0.0017 |
0.0010 |
0.0112 |
|
The interpretation of the results may be summarized as follows.
Table 13.7 shows that the baccalauréats B on one side
and F on the other side are most strongly responsible for the variation
on the first axis. The second axis mostly characterizes an opposition
between baccalauréats A and C.
Regarding the regions, Ile de France plays an important role on each axis.
On the first axis, it is opposed to Lorraine and Alsace, whereas on
the second axis, it is opposed to
Poitou-Charentes and Aquitaine. All of this is confirmed in
Figure 13.3.
On the right side are the more classical baccalauréats and on the left,
more technical ones. The regions on the left side have thus larger weights
in the technical baccalauréats. Note also that most of the southern
regions of France are concentrated in the lower part of the
graph near the baccalauréat A.
Finally, looking at the
-rd axis, we see that it is dominated by the
baccalauréat E (negative sign) and to a lesser degree
by H (negative) (as opposed to A (positive sign)). The dominating regions
are HNOR (positive sign), opposed to NOPC and AUVE (negative sign). For
instance, HNOR is particularly poor in baccalauréat D.
EXAMPLE 13.5
The U.S. crime data set (Table
B.10) gives the
number of crimes in the 50 states of the U.S. classified in 1985 for each
of the following seven categories:
murder, rape, robbery, assault, burglary, larceny and auto-theft.
The analysis of the contingency table, limited to the first two factors,
provides the following results (see Table
13.8).
Table:
Eigenvalues and explained proportion of variance, Example 13.5.
 |
percentage of variance |
cumulated percentage |
4399.0 |
0.4914 |
0.4914 |
2213.6 |
0.2473 |
0.7387 |
1382.4 |
0.1544 |
0.8932 |
870.7 |
0.0973 |
0.9904 |
51.0 |
0.0057 |
0.9961 |
34.8 |
0.0039 |
1.0000 |
0.0 |
0.0000 |
0.0000 |
|
Figure 13.4:
Projection of rows (the 50 states) and columns (the 7 crime categories).
MVAcorrcrime.xpl
|
Looking at the absolute contributions (not reproduced
here, see Exercise 13.6), it appears that the
first axis is robbery (+) versus larceny (-) and
auto-theft (-) axis and that the second factor contrasts assault (-) to
auto-theft (+). The dominating states for the first axis are
the North-Eastern States MA (+) and NY (+) constrasting the Western States
WY (-)and ID (-). For the second axis, the differences are seen between
the Northern States (MA (+) and RI (+)) and the Southern States
AL (-), MS (-) and AR (-). These results can be clearly seen in
Figure 13.4 where all the states and crimes are reported.
The figure also shows in which states the proportion of a particular
crime category is higher or lower than the national average
(the origin).
Biplots
The biplot is a low-dimensional display of a data matrix
where the rows and columns are represented
by points. The interpretation of a biplot is specifically directed
towards the scalar products of lower dimensional factorial variables
and is designed to approximately recover the individual elements of the
data matrix in these scalar products.
Suppose that we have a (
) data matrix with elements
.
The idea of the biplot is to find 10 row points
(
) and 5 column points
(
) such that the 50 scalar products between the row and the
column vectors closely approximate the 50 corresponding elements of the
data matrix
.
Usually we choose
. For example, the scalar product
between
and
should approximate the data value
in
the seventh row and the fourth column. In general, the
biplot models the data
as the sum of a scalar product in some
low-dimensional subspace and a residual ``error'' term:
To understand the link between correspondence analysis and the biplot,
we need to introduce a formula which expresses
from the
original data matrix (see (13.3)) in terms of
row and column frequencies. One such formula, known as
the ``reconstitution formula'', is (13.10):
 |
(13.26) |
Consider now the row profiles
(the conditional frequencies) and the average row profile
. From (13.26)
we obtain the difference between each row profile and this average:
 |
(13.27) |
By the same argument we can also obtain the difference between each
column profile and the average column profile:
 |
(13.28) |
Now, if
, we can
approximate these sums by a finite number of
terms
(usually
) using (13.16) to obtain
where
and
are error terms.
(13.30) shows that if we consider displaying the differences between
the row profiles and the average profile, then the projection of the
row profile
and a rescaled version of the projections of the
column profile
constitute a biplot of these differences.
(13.29) implies the same for the differences
between the column profiles and this average.
Summary

-
Correspondence analysis is a factorial decomposition of contingency
tables. The
-dimensional individuals and the
-dimensional
variables can be graphically represented by projecting onto spaces of
smaller dimension.

-
The practical computation consists of first computing a spectral
decomposition of
and
which have the
same first
eigenvalues. The graphical representation is obtained by
plotting
vs.
and
vs.
.
Both plots maybe displayed in the same graph taking into account
the appropriate orientation of the eigenvectors
.

-
Correspondence analysis provides a graphical display of the
association measure
.

- Biplot is a low-dimensional display of a data matrix where the
rows and columns are represented by points