The practical implementation of the techniques introduced
begins with the computation of the eigenvalues
and the corresponding eigenvectors
.
(Since
is usually less than
, this is
numerically less involved than computing
directly for
).
The representation of the
individuals on a plane is then
obtained by plotting
versus
(
may eventually be added if a third dimension
is helpful). Using the Duality Relation (8.13) representations
for the
variables can easily be obtained.
These representations can be visualized in a scatterplot of
against
(and eventually against
).
Higher dimensional factorial resolutions can be obtained
(by computing
and
for
) but, of course, cannot
be plotted.
A standard way of evaluating the quality of the factorial
representations in a subspace of dimension is
given by the ratio
The correlation matrix corresponding to the data is
We observe a rather high correlation between meat and poultry, whereas the expenditure for milk and wine is rather small. Are there household types that prefer, say, meat over bread?
We shall now represent food expenditures and households
simultaneously using two factors.
First, note that in this particular problem the origin has no specific
meaning (it represents a ``zero'' consumer).
So it makes sense to compare
the consumption of any family to that of an ``average family''
rather than to the origin.
Therefore, the data is first centered (the origin is translated
to the center of gravity, ). Furthermore, since the
dispersions of the 7 variables are quite different
each variable is standardized so that each
has the same weight in the analysis (mean 0 and variance 1).
Finally, for convenience, we divide each element in the matrix by
. (This will only change the scaling of the plots in the
graphical representation.)
The data matrix to be analyzed is
The coordinates of the projected data points are given in the two
lower windows of Figure 8.6.
Let us first examine the food expenditure window.
In this window we see the representation of the variables given
by the first two factors. The plot shows the factorial
variables
and
in the same fashion as Figure 8.4.
We see that the points for meat, poultry, vegetables and fruits are
close to each other in the lower left of the graph.
The expenditures for bread and milk can be found
in the upper left whereas wine stands alone in the upper right.
The first factor,
, may be interpreted as the meat/fruit factor
of consumption, the second factor,
, as the bread/wine component.
In the lower window on the right-hand side,
we show the factorial variables and
from the fit of the
household types. Note that by the
Duality Relations of Theorem 8.4, the factorial variables
are
linear combinations of the factors
from the left window.
The points displayed in the consumer window (graph on the right) are plotted
relative to an average consumer represented by the origin.
The manager families are located in the lower left corner of the graph
whereas the manual workers and employees tend to be in the upper right. The
factorial variables for CA5 (managers with five children) lie close
to the meat/fruit factor. Relative to the average consumer this
household type is a large consumer of meat/poultry and fruits/vegetables.
In Chapter 9, we will return to these plots interpreting them
in a much deeper way. At this stage, it suffices to notice that
the plots provide a graphical representation in
of the
information contained in the original, high-dimensional (
) data matrix.