1.4 Scatterplots

**Figure 1.12:** 2D scatterplot for $X_{5}$ vs. $X_{6}$ of the bank notes. Genuine notes are circles, counterfeit notes are stars. `MVAscabank56.xpl`
$\includegraphics[width=1\defpicwidth]{scabank56.ps}$

Figure 1.12 plots the 5th column (upper inner frame) of the bank data against the 6th column (diagonal). The scatter is downward-sloping. As we already know from the previous section on marginal comparison (e.g., Figure 1.9) a good separation between genuine and counterfeit bank notes is visible for the diagonal variable. The sub-cloud in the upper half (circles) of Figure 1.12 corresponds to the true bank notes. As noted before, this separation is not distinct, since the two groups overlap somewhat.

This can be verified in an interactive computing environment by showing the index and coordinates of certain points in this scatterplot. In Figure 1.12, the 70th observation in the merged data set is given as a thick circle, and it is from a genuine bank note. This observation lies well embedded in the cloud of counterfeit bank notes. One straightforward approach that could be used to tell the counterfeit from the genuine bank notes is to draw a straight line and define notes above this value as genuine. We would of course misclassify the 70th observation, but can we do better?

**Figure 1.13:** 3D Scatterplot of the bank notes for . Genuine notes are circles, counterfeit are stars. `MVAscabank456.xpl`
$\includegraphics[width=1\defpicwidth]{scabank456.ps}$

If we extend the two-dimensional scatterplot by adding a third variable, e.g.,

(lower distance to inner frame), we obtain the scatterplot in three-dimensions as shown in Figure 1.13. It becomes apparent from the location of the point clouds that a better separation is obtained. We have rotated the three dimensional data until this satisfactory 3D view was obtained. Later, we will see that rotation is the same as bundling a high-dimensional observation into one or more linear combinations of the elements of the observation vector. In other words, the ``separation line" parallel to the horizontal coordinate axis in Figure 1.12 is in Figure 1.13 a plane and no longer parallel to one of the axes. The formula for such a separation plane is a linear combination of the elements of the observation vector:

$\begin{displaymath} a_{1}x_{1} + a_{2}x_{2} + \ldots + a_{6}x_{6} = \textrm{const}. \end{displaymath}$

(1.12)

Let us study yet another technique: the scatterplot matrix. If we want to draw all possible two-dimensional scatterplots for the variables, we can create a so-called draftman's plot (named after a draftman who prepares drafts for parliamentary discussions). Similar to a draftman's plot the scatterplot matrix helps in creating new ideas and in building knowledge about dependencies and structure.

**Figure 1.14:** Draftman plot of the bank notes. The pictures in the left column show , $(X_{3}, X_{5})$ and $(X_{3}, X_{6})$ , in the middle we have and , and in the lower right is . The upper right half contains the corresponding density contour plots. `MVAdrafbank4.xpl`
$\includegraphics[width=1\defpicwidth]{drafbank4.ps}$

Figure 1.14 shows a draftman plot applied to the last four columns of the full bank data set. For ease of interpretation we have distinguished between the group of counterfeit and genuine bank notes by a different color. As discussed several times before, the separability of the two types of notes is different for different scatterplots. Not only is it difficult to perform this separation on, say, scatterplot $X_{3}$ vs. $X_{4}$ , in addition the ``separation line" is no longer parallel to one of the axes. The most obvious separation happens in the scatterplot in the lower right where we show, as in Figure 1.12,

vs.

. The separation line here would be upward-sloping with an intercept at about $X_{6} = 139$ . The upper right half of the draftman plot shows the density contours that we have introduced in Section 1.3.

The power of the draftman plot lies in its ability to show the the internal connections of the scatter diagrams. Define a brush as a re-scalable rectangle that we can move via keyboard or mouse over the screen. Inside the brush we can highlight or color observations. Suppose the technique is installed in such a way that as we move the brush in one scatter, the corresponding observations in the other scatters are also highlighted. By moving the brush, we can study conditional dependence.

If we brush (i.e., highlight or color the observation with the brush) the $X_{5}$ vs. $X_{6}$ plot and move through the upper point cloud, we see that in other plots (e.g., $X_{3}$ vs. $X_{4}$ ), the corresponding observations are more embedded in the other sub-cloud.