3.1 Covariance

Properties of covariance matrices will be detailed in Chapter 4. Empirical versions of these quantities are:

For a scatterplot of two variables the covariances measure ``how close the scatter is to a line''. Mathematical details follow but it should already be understood here that in this sense covariance measures only ``linear dependence''.

EXAMPLE 3.1 If ${\data{X}}$ is the entire bank data set, one obtains the covariance matrix $\data{S}$ as indicated below:

$\begin{displaymath} {\data{S}} = \left ( {\begin{array}{rrrrrr}\ 0.14&0.03&0.02&... ...4\\ 0.08&-0.21&-0.24&-1.03&-0.54&1.32 \end{array}} \right) . \end{displaymath}$

(3.4)

The empirical covariance between

and

, i.e., $s_{X_{4}X_{5}}$ , is found in row 4 and column 5. The value is $s_{X_{4}X_{5}}$ = 0.16. Is it obvious that this value is positive? In Exercise 3.1 we will discuss this question further.

If $\data{X}_{f}$ denotes the counterfeit bank notes, we obtain:

$\begin{displaymath} {\data{S}_{f}} = \left ( {\begin{array}{rrrrrr}\ 0.123& 0.0... ...005& 0.034& 0.236& -0.022& 0.308\\ \end{array}} \right)\cdot \end{displaymath}$

(3.5)

For the genuine, $\data{X}_{g}$ , we have:

$\begin{displaymath} {\data{S}_{g}} = \left ( {\begin{array}{rrrrrr}\ 0.149& 0.05... ...3& -0.024& -0.000& -0.074& 0.198\\ \end{array}} \right)\cdot \end{displaymath}$

(3.6)

Note that the covariance between

(distance of the frame to the lower border) and

(distance of the frame to the upper border) is negative in both (3.5) and (3.6)! Why would this happen? In Exercise 3.2 we will discuss this question in more detail.

At first sight, the matrices ${\data{S}_{f}}$ and ${\data{S}_{g}}$ look different, but they create almost the same scatterplots (see the discussion in Section 1.4). Similarly, the common principal component analysis in Chapter 9 suggests a joint analysis of the covariance structure as in Flury and Riedwyl (1988).

**Figure 3.1:** Scatterplot of variables vs. of the entire bank data set. `MVAscabank45.xpl`
$\includegraphics[width=1\defpicwidth]{scabank45.ps}$

Scatterplots with point clouds that are ``upward-sloping'', like the one in the upper left of Figure 1.14, show variables with positive covariance. Scatterplots with ``downward-sloping'' structure have negative covariance. In Figure 3.1 we show the scatterplot of

vs.

of the entire bank data set. The point cloud is upward-sloping. However, the two sub-clouds of counterfeit and genuine bank notes are downward-sloping.

EXAMPLE 3.2 A textile shop manager is studying the sales of ``classic blue'' pullovers over 10 different periods. He observes the number of pullovers sold ( $X_{1}$ ), variation in price ( $X_{2}$ , in EUR), the advertisement costs in local newspapers ( $X_{3}$ , in EUR) and the presence of a sales assistant ( $X_{4}$ , in hours per period). Over the periods, he observes the following data matrix:

$\begin{displaymath}\data{X} = \left( \begin{array}{rrrr} 230 & 125 & 200 & 109 ... ...& 95 & 110 & 86 \\ 170 & 125 & 130 & 78 \end{array} \right) .\end{displaymath}$

He is convinced that the price must have a large influence on the number of pullovers sold. So he makes a scatterplot of $X_{2}$ vs. $X_{1}$ , see Figure 3.2.

**Figure 3.2:** Scatterplot of variables vs. of the pullovers data set. `MVAscapull1.xpl`
$\includegraphics[width=1\defpicwidth]{scapull1.ps}$

A rough impression is that the cloud is somewhat downward-sloping. A computation of the empirical covariance yields

$\begin{displaymath}s_{X_{1}X_{2}} = \frac{1}{9} \sum_{i=1}^{10} \left( X_{1i} - \bar{X_1} \right) \left( X_{2i} - \bar{X_2} \right) = -80.02, \end{displaymath}$

a negative value as expected.

Note: The covariance function is scale dependent. Thus, if the prices in this example were in Japanese Yen (JPY), we would obtain a different answer (see Exercise 3.16). A measure of (linear) dependence independent of the scale is the correlation, which we introduce in the next section.

$\displaystyle s_{XY}$	$\textstyle =$	$\displaystyle \frac{1 }{n }\sum ^n_{i=1}(x_i-\overline x)(y_i-\overline y)$	(3.2)
$\displaystyle s_{XX}$	$\textstyle =$	$\displaystyle \frac{1 }{n }\sum ^n_{i=1}(x_i-\overline x)^2.$	(3.3)