13.3 Correspondence Analysis in Practice

The graphical representations on the axes $k=1, 2, \ldots, R$ of the $n$ rows and of the $p$ columns of $\data{X}$ are provided by the elements of $r_k$ and $s_k$. Typically, two-dimensional displays are often satisfactory if the cumulated percentage of variance explained by the first two factors, $\Psi_2 = \frac{\lambda_1 + \lambda_2}
{\sum_{k=1}^R \lambda_k}$, is sufficiently large.

The interpretation of the graphs may be summarized as follows:

-
The proximity of two rows (two columns) indicates a similar profile in these two rows (two columns), where ``profile'' referrs to the conditional frequency distribution of a row (column); those two rows (columns) are almost proportional. The opposite interpretation applies when the two rows (two columns) are far apart.
-
The proximity of a particular row to a particular column indicates that this row (column) has a particularly important weight in this column (row). In contrast to this, a row that is quite distant from a particular column indicates that there are almost no observations in this column for this row (and vice versa). Of course, as mentioned above, these conclusions are particularly true when the points are far away from $0$.
-
The origin is the average of the factors $r_k$ and $s_k$. Hence, a particular point (row or column) projected close to the origin indicates an average profile.
-
The absolute contributions are used to evaluate the weight of each row (column) in the variances of the factors.
-
All the interpretations outlined above must be carried out in view of the quality of the graphical representation which is evaluated, as in PCA, using the cumulated percentage of variance.

REMARK 13.1   Note that correspondence analysis can also be applied to more general $(n\times p)$ tables $\data{X}$ which in a ``strict sense'' are not contingency tables.

As long as statistical (or natural) meaning can be given to sums over rows and columns, Remark 13.1 holds. This implies, in particular, that all of the variables are measured in the same units. In that case, $x_{\bullet \bullet}$ constitutes the total frequency of the observed phenomenon, and is shared between individuals ($n$ rows) and between variables ($p$ columns). Representations of the rows and columns of $\data{X}$, $r_k$ and $s_k$, have the basic property (13.19) and show which variables have important weights for each individual and vice versa. This type of analysis is used as an alternative to PCA. PCA is mainly concerned with covariances and correlations, whereas correspondence analysis analyzes a more general kind of association. (See Exercises 13.3 and 13.11.)

EXAMPLE 13.3   A survey of Belgium citizens who regularly read a newspaper was conducted in the 1980's. They were asked where they lived. The possible answers were 10 regions: 7 provinces (Antwerp, Western Flanders, Eastern Flanders, Hainant, Liège, Limbourg, Luxembourg) and 3 regions around Brussels (Flemish-Brabant, Wallon-Brabant and the city of Brussels). They were also asked what kind of newspapers they read on a regular basis. There were 15 possible answers split up into 3 classes: Flemish newspapers (label begins with the letter $v$), French newspapers (label begins with $f$) and both languages together (label begins with $b$). The data set is given in Table B.9. The eigenvalues of the factorial correspondence analysis are given in Table 13.1.


Table: Eigenvalues and percentages of the variance (Example 13.3) .
$\lambda_j$ percentage of variance cumulated percentage
183.40 0.653 0.653
43.75 0.156 0.809
25.21 0.090 0.898
11.74 0.042 0.940
8.04 0.029 0.969
4.68 0.017 0.985
2.13 0.008 0.993
1.20 0.004 0.997
0.82 0.003 1.000
0.00 0.000 1.000


Two-dimensional representations will be quite satisfactory since the first two eigenvalues account for 81% of the variance. Figure 13.1 shows the projections of the rows (the 15 newspapers) and of the columns (the 10 regions).

Figure 13.1: Projection of rows (the 15 newspapers) and columns (the 10 regions) 43711 MVAcorrjourn.xpl
\includegraphics[width=1\defpicwidth]{corrjou.ps}

As expected, there is a high association between the regions and the type of newspapers which is read. In particular, $v_b$ (Gazet van Antwerp) is almost exclusively read in the province of Antwerp (this is an extreme point in the graph). The points on the left all belong to Flanders, whereas those on the right all belong to Wallonia. Notice that the Wallon-Brabant and the Flemish-Brabant are not far from Brussels. Brussels is close to the center (average) and also close to the bilingual newspapers. It is shifted a little to the right of the origin due to the majority of French speaking people in the area.


Table 13.2: Absolute contributions of row factors $r_k$.
  $C_a(i,r_1)$ $C_a(i,r_2)$ $C_a(i,r_3)$
$v_a$ 0.0563 0.0008 0.0036
$v_b$ 0.1555 0.5567 0.0067
$v_c$ 0.0244 0.1179 0.0266
$v_d$ 0.1352 0.0952 0.0164
$v_e$ 0.0253 0.1193 0.0013
$f_f$ 0.0314 0.0183 0.0597
$f_g$ 0.0585 0.0162 0.0122
$f_h$ 0.1086 0.0024 0.0656
$f_i$ 0.1001 0.0024 0.6376
$b_j$ 0.0029 0.0055 0.0187
$b_k$ 0.0236 0.0278 0.0237
$b_l$ 0.0006 0.0090 0.0064
$v_m$ 0.1000 0.0038 0.0047
$f_n$ 0.0966 0.0059 0.0269
$f_0$ 0.0810 0.0188 0.0899
Total 1.0000 1.0000 1.0000


The absolute contributions of the first 3 factors are listed in Tables 13.2 and 13.3. The row factors $r_k$ are in Table 13.2 and the column factors $s_k$ are in Table 13.3.


Table 13.3: Absolute contributions of column factors $s_k$.
  $C_a(j,s_1)$ $C_a(j,s_2)$ $C_a(j,s_3)$
brw 0.0887 0.0210 0.2860
bxl 0.1259 0.0010 0.0960
anv 0.2999 0.4349 0.0029
brf 0.0064 0.2370 0.0090
foc 0.0729 0.1409 0.0033
for 0.0998 0.0023 0.0079
hai 0.1046 0.0012 0.3141
lig 0.1168 0.0355 0.1025
lim 0.0562 0.1162 0.0027
lux 0.0288 0.0101 0.1761
Total 1.0000 1.0000 1.0000


They show, for instance, the important role of Antwerp and the newspaper $v_b$ in determining the variance of both factors. Clearly, the first axis expresses linguistic differences between the 3 parts of Belgium. The second axis shows a larger dispersion between the Flemish region than the French speaking regions. Note also that the 3-rd axis shows an important role of the category ``$f_i$'' (other French newspapers) with the Wallon-Brabant ``brw'' and the Hainant ``hai'' showing the most important contributions. The coordinate of ``$f_i$'' on this axis is negative (not shown here) so are the coordinates of ``brw'' and ``hai''. Apparently, these two regions also seem to feature a greater proportion of readers of more local newspapers.

Figure 13.2: Correspondence analysis including Corsica 43722 MVAcorrbac.xpl

\includegraphics[width=1.05\defpicwidth]{corr1.ps}


Table: Eigenvalues and percentages of explained variance (including Corsica).

eigenvalues $\lambda $ percentage of variances cumulated percentage
2436.2 0.5605 0.561
1052.4 0.2421 0.803
341.8 0.0786 0.881
229.5 0.0528 0.934
152.2 0.0350 0.969
109.1 0.0251 0.994
25.0 0.0058 1.000
0.0 0.0000 1.000


EXAMPLE 13.4   Applying correspondence analysis to the French baccalauréat data (Table B.8) leads to Figure 13.2. Excluding Corsica we obtain Figure 13.3. The different modalities are labeled A, ..., H and the regions are labeled ILDF, ..., CORS. The results of the correspondence analysis are given in Table 13.4 and Figure 13.2.

The first two factors explain 80 % of the total variance. It is clear from Figure 13.2 that Corsica (in the upper left) is an outlier. The analysis is therefore redone without Corsica and the results are given in Table 13.5 and Figure 13.3. Since Corsica has such a small weight in the analysis, the results have not changed much.

Figure 13.3: Correspondence analysis excluding Corsica. 43731 MVAcorrbac.xpl
\includegraphics[width=1.05\defpicwidth]{corr2.ps}


Table: Eigenvalues and percentages of explained variance (excluding Corsica).

eigenvalues $\lambda $ percentage of variances cumulated percentage
2408.6 0.5874 0.587
909.5 0.2218 0.809
318.5 0.0766 0.887
195.9 0.0478 0.935
149.3 0.0304 0.971
96.1 0.0234 0.994
22.8 0.0056 1.000
0.0 0.0000 1.000


The projections on the first three axes, along with their absolute contribution to the variance of the axis, are summarized in Table 13.6 for the regions and in Table 13.7 for baccalauréats.


Table: Coefficients and absolute contributions for regions, Example 13.4.
Region $r_1$ $r_2$ $r_3$ $C_a(i,r_1)$ $C_a(i,r_2)$ $C_a(i,r_3)$
ILDF 0.1464 0.0677 0.0157 0.3839 0.2175 0.0333
CHAM -0.0603 -0.0410 -0.0187 0.0064 0.0078 0.0047
PICA 0.0323 -0.0258 -0.0318 0.0021 0.0036 0.0155
HNOR -0.0692 0.0287 0.1156 0.0096 0.0044 0.2035
CENT -0.0068 -0.0205 -0.0145 0.0001 0.0030 0.0043
BNOR -0.0271 -0.0762 0.0061 0.0014 0.0284 0.0005
BOUR -0.1921 0.0188 0.0578 0.0920 0.0023 0.0630
NOPC -0.1278 0.0863 -0.0570 0.0871 0.1052 0.1311
LORR -0.2084 0.0511 0.0467 0.1606 0.0256 0.0608
ALSA -0.2331 0.0838 0.0655 0.1283 0.0439 0.0767
FRAC -0.1304 -0.0368 -0.0444 0.0265 0.0056 0.0232
PAYL -0.0743 -0.0816 -0.0341 0.0232 0.0743 0.0370
BRET 0.0158 0.0249 -0.0469 0.0011 0.0070 0.0708
PCHA -0.0610 -0.1391 -0.0178 0.0085 0.1171 0.0054
AQUI 0.0368 -0.1183 0.0455 0.0055 0.1519 0.0643
MIDI 0.0208 -0.0567 0.0138 0.0018 0.0359 0.0061
LIMO -0.0540 0.0221 -0.0427 0.0033 0.0014 0.0154
RHOA -0.0225 0.0273 -0.0385 0.0042 0.0161 0.0918
AUVE 0.0290 -0.0139 -0.0554 0.0017 0.0010 0.0469
LARO 0.0290 -0.0862 -0.0177 0.0383 0.0595 0.0072
PROV 0.0469 -0.0717 0.0279 0.0142 0.0884 0.0383



Table: Coefficients and absolute contributions for baccalauréats, Example 13.4.
Baccal $s_1$ $s_2$ $s_3$ $C_a(j,s_1)$ $C_a(j,s_2)$ $C_a(j,s_3)$
A 0.0447 -0.0679 0.0367 0.0376 0.2292 0.1916
B 0.1389 0.0557 0.0011 0.1724 0.0735 0.0001
C 0.0940 0.0995 0.0079 0.1198 0.3556 0.0064
D 0.0227 -0.0495 -0.0530 0.0098 0.1237 0.4040
E -0.1932 0.0492 -0.1317 0.0825 0.0141 0.2900
F -0.2156 0.0862 0.0188 0.3793 0.1608 0.0219
G -0.1244 -0.0353 0.0279 0.1969 0.0421 0.0749
H -0.0945 0.0438 -0.0888 0.0017 0.0010 0.0112


The interpretation of the results may be summarized as follows. Table 13.7 shows that the baccalauréats B on one side and F on the other side are most strongly responsible for the variation on the first axis. The second axis mostly characterizes an opposition between baccalauréats A and C. Regarding the regions, Ile de France plays an important role on each axis. On the first axis, it is opposed to Lorraine and Alsace, whereas on the second axis, it is opposed to Poitou-Charentes and Aquitaine. All of this is confirmed in Figure 13.3.

On the right side are the more classical baccalauréats and on the left, more technical ones. The regions on the left side have thus larger weights in the technical baccalauréats. Note also that most of the southern regions of France are concentrated in the lower part of the graph near the baccalauréat A.

Finally, looking at the $3$-rd axis, we see that it is dominated by the baccalauréat E (negative sign) and to a lesser degree by H (negative) (as opposed to A (positive sign)). The dominating regions are HNOR (positive sign), opposed to NOPC and AUVE (negative sign). For instance, HNOR is particularly poor in baccalauréat D.

EXAMPLE 13.5   The U.S. crime data set (Table B.10) gives the number of crimes in the 50 states of the U.S. classified in 1985 for each of the following seven categories: murder, rape, robbery, assault, burglary, larceny and auto-theft. The analysis of the contingency table, limited to the first two factors, provides the following results (see Table 13.8).


Table: Eigenvalues and explained proportion of variance, Example 13.5.
$\lambda_j$ percentage of variance cumulated percentage
4399.0 0.4914 0.4914
2213.6 0.2473 0.7387
1382.4 0.1544 0.8932
870.7 0.0973 0.9904
51.0 0.0057 0.9961
34.8 0.0039 1.0000
0.0 0.0000 0.0000


Figure 13.4: Projection of rows (the 50 states) and columns (the 7 crime categories). 43750 MVAcorrcrime.xpl
\includegraphics[width=1\defpicwidth]{corrc.ps}

Looking at the absolute contributions (not reproduced here, see Exercise 13.6), it appears that the first axis is robbery (+) versus larceny (-) and auto-theft (-) axis and that the second factor contrasts assault (-) to auto-theft (+). The dominating states for the first axis are the North-Eastern States MA (+) and NY (+) constrasting the Western States WY (-)and ID (-). For the second axis, the differences are seen between the Northern States (MA (+) and RI (+)) and the Southern States AL (-), MS (-) and AR (-). These results can be clearly seen in Figure 13.4 where all the states and crimes are reported. The figure also shows in which states the proportion of a particular crime category is higher or lower than the national average (the origin).


Biplots

The biplot is a low-dimensional display of a data matrix $\data X$ where the rows and columns are represented by points. The interpretation of a biplot is specifically directed towards the scalar products of lower dimensional factorial variables and is designed to approximately recover the individual elements of the data matrix in these scalar products. Suppose that we have a ($10 \times 5$) data matrix with elements $x_{ij}$. The idea of the biplot is to find 10 row points $q_i \in \mathbb{R}^k$ ( $k<p,\, i=1, \ldots, 10$) and 5 column points $t_j \in \mathbb{R}^k$ ($j=1,\ldots, 5$) such that the 50 scalar products between the row and the column vectors closely approximate the 50 corresponding elements of the data matrix ${\data{X}}
$. Usually we choose $k=2$. For example, the scalar product between $q_7$ and $t_4$ should approximate the data value $x_{74}$ in the seventh row and the fourth column. In general, the biplot models the data $x_{ij}$ as the sum of a scalar product in some low-dimensional subspace and a residual ``error'' term:

$\displaystyle x_{ij}$ $\textstyle =$ $\displaystyle q_i^{\top} t_j + e_{ij}$  
  $\textstyle =$ $\displaystyle \sum_k q_{ik}t_{jk} + e_{ij}.$ (13.25)

To understand the link between correspondence analysis and the biplot, we need to introduce a formula which expresses $x_{ij}$ from the original data matrix (see (13.3)) in terms of row and column frequencies. One such formula, known as the ``reconstitution formula'', is (13.10):
\begin{displaymath}
x_{ij}=E_{ij}\left(1+\frac{\sum_{k=1}^R \lambda_k^{\frac{1}{...
...\frac{x_{i\bullet}x_{\bullet j}}
{x_{\bullet\bullet}}}}\right)
\end{displaymath} (13.26)

Consider now the row profiles ${x_{ij}}/{x_{i\bullet}}$ (the conditional frequencies) and the average row profile ${x_{i \bullet}}/{x_{\bullet\bullet}}$. From (13.26) we obtain the difference between each row profile and this average:
\begin{displaymath}
\left( \frac{x_{ij}}{x_{i\bullet}} - \frac{x_{i \bullet}}
{x...
...llet j}}{x_{i\bullet}x_{\bullet\bullet}}} \right)
\delta_{jk}.
\end{displaymath} (13.27)

By the same argument we can also obtain the difference between each column profile and the average column profile:
\begin{displaymath}
\left( \frac{x_{ij}}{x_{\bullet j}} - \frac{x_{\bullet j}}
{...
...ullet}}{x_{\bullet j}x_{\bullet\bullet}}} \right)
\delta_{jk}.
\end{displaymath} (13.28)

Now, if $\lambda_1 \gg \lambda_2 \gg \lambda_3 \ldots$, we can approximate these sums by a finite number of $K$ terms (usually $K=2$) using (13.16) to obtain
$\displaystyle \left( \frac{x_{ij}}{x_{\bullet j}} - \frac{x_{i\bullet}}
{x_{\bullet\bullet}} \right)$ $\textstyle =$ $\displaystyle \sum_{k=1}^K \left( \frac{x_{\bullet i}}{\sqrt{\lambda_k x_{\bullet
\bullet}}} r_{ki} \right) s_{kj} + e_{ij},$ (13.29)
$\displaystyle \left( \frac{x_{ij}}{x_{i\bullet}} - \frac{x_{\bullet j}}
{x_{\bullet\bullet}} \right)$ $\textstyle =$ $\displaystyle \sum_{k=1}^K \left( \frac{x_{\bullet j}}{\sqrt{\lambda_k x_{\bullet
\bullet}}} s_{kj} \right) r_{ki} + e'_{ij},$ (13.30)

where $e_{ij}$ and $e'_{ij}$ are error terms. (13.30) shows that if we consider displaying the differences between the row profiles and the average profile, then the projection of the row profile $r_k$ and a rescaled version of the projections of the column profile $s_k$ constitute a biplot of these differences. (13.29) implies the same for the differences between the column profiles and this average.

Summary
$\ast$
Correspondence analysis is a factorial decomposition of contingency tables. The $p$-dimensional individuals and the $n$-dimensional variables can be graphically represented by projecting onto spaces of smaller dimension.
$\ast$
The practical computation consists of first computing a spectral decomposition of $\data{A}^{-1}\data{X}\data{B}^{-1}\data{X}^{\top}$ and $\data{B}^{-1}\data{X}^{\top}\data{A}^{-1}\data{X}$ which have the same first $p$ eigenvalues. The graphical representation is obtained by plotting $\sqrt{\lambda_{1}} r_{1}$ vs. $\sqrt{\lambda_{2}} r_{2}$ and $\sqrt{\lambda_{1}} s_{1}$ vs. $\sqrt{\lambda_{2}} s_{2}$. Both plots maybe displayed in the same graph taking into account the appropriate orientation of the eigenvectors $r_{i}, s_{j}$.
$\ast$
Correspondence analysis provides a graphical display of the association measure $c_{ij} = (x_{ij} - E_{ij})^2/E_{ij}$.
$\ast$
Biplot is a low-dimensional display of a data matrix where the rows and columns are represented by points