13. Correspondence Analysis

Correspondence analysis provides tools for analyzing the associations between rows and columns of contingency tables. A contingency table is a two-entry frequency table where the joint frequencies of two qualitative variables are reported. For instance a $(2\times 2)$ table could be formed by observing from a sample of $n$ individuals two qualitative variables: the individual's sex and whether the individual smokes. The table reports the observed joint frequencies. In general $(n\times p)$ tables may be considered.

The main idea of correspondence analysis is to develop simple indices that will show the relations between the row and the columns categories. These indices will tell us simultaneously which column categories have more weight in a row category and vice-versa. Correspondence analysis is also related to the issue of reducing the dimension of the table, similar to principal component analysis in Chapter 9, and to the issue of decomposing the table into its factors as discussed in Chapter 8. The idea is to extract the indices in decreasing order of importance so that the main information of the table can be summarized in spaces with smaller dimensions. For instance, if only two factors (indices) are used, the results can be shown in two-dimensional graphs, showing the relationship between the rows and the columns of the table.

Section 13.1 defines the basic notation and motivates the approach and Section 13.2 gives the basic theory. The indices will be used to describe the $\chi^2$ statistic measuring the associations in the table. Several examples in Section 13.3 show how to provide and interpret, in practice, the two-dimensional graphs displaying the relationship between the rows and the columns of a contingency table.