1.5 Chernoff-Flury Faces

If we are given data in numerical form, we tend to display it also numerically. This was done in the preceding sections: an observation $x_1=(1,2)$ was plotted as the point $(1,2)$ in a two-dimensional coordinate system. In multivariate analysis we want to understand data in low dimensions (e.g., on a 2D computer screen) although the structures are hidden in high dimensions. The numerical display of data structures using coordinates therefore ends at dimensions greater than three.

If we are interested in condensing a structure into 2D elements, we have to consider alternative graphical techniques. The Chernoff-Flury faces, for example, provide such a condensation of high-dimensional information into a simple ``face''. In fact faces are a simple way to graphically display high-dimensional data. The size of the face elements like pupils, eyes, upper and lower hair line, etc., are assigned to certain variables. The idea of using faces goes back to Chernoff (1973) and has been further developed by Bernhard Flury. We follow the design described in Flury and Riedwyl (1988) which uses the following characteristics.

1 right eye size
2 right pupil size
3 position of right pupil
4 right eye slant
5 horizontal position of right eye
6 vertical position of right eye
7 curvature of right eyebrow
8 density of right eyebrow
9 horizontal position of right eyebrow
10 vertical position of right eyebrow
11 right upper hair line
12 right lower hair line
13 right face line
14 darkness of right hair
15 right hair slant
16 right nose line
17 right size of mouth
18 right curvature of mouth
19-36 like 1-18, only for the left side.

Figure 1.15: Chernoff-Flury faces for observations 91 to 110 of the bank notes. 2949 MVAfacebank10.xpl
\includegraphics[width=1\defpicwidth]{face10.ps}

First, every variable that is to be coded into a characteristic face element is transformed into a $(0,1)$ scale, i.e., the minimum of the variable corresponds to $0$ and the maximum to $1$. The extreme positions of the face elements therefore correspond to a certain ``grin'' or ``happy'' face element. Dark hair might be coded as $1$, and blond hair as $0$ and so on.

As an example, consider the observations 91 to 110 of the bank data. Recall that the bank data set consists of 200 observations of dimension 6 where, for example, $X_{6}$ is the diagonal of the note. If we assign the six variables to the following face elements

\begin{eqnarray*}
X_1 &=& \textrm{1, 19 (eye sizes)}\\
X_2 &=& \textrm{2, 20 (p...
... &=& \textrm{13, 14, 31, 32 (face lines and darkness of hair),}
\end{eqnarray*}



we obtain Figure 1.15. Also recall that observations 1-100 correspond to the genuine notes, and that observations 101-200 correspond to the counterfeit notes. The counterfeit bank notes then correspond to the lower half of Figure 1.15. In fact the faces for these observations look more grim and less happy. The variable $X_{6}$ (diagonal) already worked well in the boxplot on Figure 1.4 in distinguishing between the counterfeit and genuine notes. Here, this variable is assigned to the face line and the darkness of the hair. That is why we clearly see a good separation within these 20 observations.

Figure: Chernoff-Flury faces for observations 1 to 50 of the bank notes. 2953 MVAfacebank50.xpl
\includegraphics[width=1\defpicwidth]{face501.ps}

Figure: Chernoff-Flury faces for observations 51 to 100 of the bank notes. 2957 MVAfacebank50.xpl
\includegraphics[width=1\defpicwidth]{face502.ps}

Figure: Chernoff-Flury faces for observations 101 to 150 of the bank notes. 2961 MVAfacebank50.xpl
\includegraphics[width=1\defpicwidth]{face503.ps}

Figure: Chernoff-Flury faces for observations 151 to 200 of the bank notes. 2965 MVAfacebank50.xpl
\includegraphics[width=1\defpicwidth]{face504.ps}

What happens if we include all 100 genuine and all 100 counterfeit bank notes in the Chernoff-Flury face technique? Figures 1.16 and 1.17 show the faces of the genuine bank notes with the same assignments as used before and Figures 1.18 and 1.19 show the faces of the counterfeit bank notes. Comparing Figure 1.16 and Figure 1.18 one clearly sees that the diagonal (face line) is longer for genuine bank notes. Equivalently coded is the hair darkness (diagonal) which is lighter (shorter) for the counterfeit bank notes. One sees that the faces of the genuine bank notes have a much darker appearance and have broader face lines. The faces in Figures 1.16-1.17 are obviously different from the ones in Figures 1.18-1.19.

Summary
$\ast$
Faces can be used to detect subgroups in multivariate data.
$\ast$
Subgroups are characterized by similar looking faces.
$\ast$
Outliers are identified by extreme faces, e.g., dark hair, smile or a happy face.
$\ast$
If one element of $X$ is unusual, the corresponding face element significantly changes in shape.