3.3 Summary Statistics

This section focuses on the representation of basic summary statistics (means, covariances and correlations) in matrix notation, since we often apply linear transformations to data. The matrix notation allows us to derive instantaneously the corresponding characteristics of the transformed variables. The Mahalanobis transformation is a prominent example of such linear transformations.

Assume that we have observed $n$ realizations of a $p$-dimensional random variable; we have a data matrix $\data{X}(n\times p)$:

\begin{displaymath}
\data{X}= \left (\begin{array}{ccc} x_{11} &\cdots&x_{1p}\\ ...
...
\vdots&&\vdots\\
x_{n1} &\cdots&x_{np}\end{array}\right ).
\end{displaymath} (3.16)

The rows $x_i=(x_{i1},\ldots ,x_{ip})\in \mathbb{R}^p$ denote the $i$-th observation of a $p$-dimensional random variable $X \in \mathbb{R}^p$.

The statistics that were briefly introduced in Section 3.1 and 3.2 can be rewritten in matrix form as follows. The ``center of gravity'' of the $n$ observations in $\mathbb{R}^p$ is given by the vector $\overline{x}$ of the means $\overline{x}_{j}$ of the $p$ variables:

\begin{displaymath}
\overline x=\left (\begin{array}{c} \overline x_1\\ \vdots\\...
..._p\end{array}\right )=n^{-1}\data{X}^{\top}\undertilde 1_{n}.
\end{displaymath} (3.17)

The dispersion of the $n$ observations can be characterized by the covariance matrix of the $p$ variables. The empirical covariances defined in (3.2) and (3.3) are the elements of the following matrix:

\begin{displaymath}
\data{S}=n^{-1}\data{X}^{\top}\data{X}-\overline x\ \overlin...
...}^{\top}\undertilde 1_{n}
\undertilde 1_{n}^{\top}\data{X}).
\end{displaymath} (3.18)

Note that this matrix is equivalently defined by

\begin{displaymath}\data{S} = \frac{1}{n} \sum_{i=1}^n (x_{i}-\overline{x})(x_{i}
-\overline{x})^{\top}. \end{displaymath}

The covariance formula (3.18) can be rewritten as $\data{S} = n^{-1}\data{X}^{\top}\data{H}\data{X}
$ with the centering matrix
\begin{displaymath}
\data{H} = \data{I}_{n} -n^{-1}\undertilde 1_{n}\undertilde 1_{n}^{\top}.
\end{displaymath} (3.19)

Note that the centering matrix is symmetric and idempotent. Indeed,

\begin{eqnarray*}
\data{H}^2
&=& (\data{I}_{n}-n^{-1}\undertilde 1_{n} \undertil...
...{n}-n^{-1}\undertilde 1_{n} \undertilde 1_{n}^{\top} =\data{H}.
\end{eqnarray*}



As a consequence $\data{S}$ is positive semidefinite, i.e.
\begin{displaymath}
\data{S}\ge 0.
\end{displaymath} (3.20)

Indeed for all $a\in \mathbb{R}^p$,

\begin{eqnarray*}
a^{\top}\data{S}a&= &n^{-1}a^{\top}\data{X}^{\top}\data{H}\dat...
...data{H},\\
&=& n^{-1}y^{\top}y = n^{-1}\sum ^p_{j=1}y^2_j\ge 0
\end{eqnarray*}



for $y=\data{H}\data{X}a$. It is well known from the one-dimensional case that $n^{-1}\sum ^n_{i=1}(x_i-\overline x)^2$ as an estimate of the variance exhibits a bias of the order $n^{-1}$ (Breiman; 1973). In the multidimensional case, $\data{S}_u=\frac{ n}{n-1 }\ \data{S}$ is an unbiased estimate of the true covariance. (This will be shown in Example 4.15.)

The sample correlation coefficient between the $i$-th and $j$-th variables is $r_{X_{i}X_{j}}$, see (3.8). If $\data{D} = \mathop{\hbox{diag}}(s_{X_{i}X_{i}})$, then the correlation matrix is

\begin{displaymath}
\data{R} = \data{D}^{-1/2}\data{S}\data{D}^{-1/2},
\end{displaymath} (3.21)

where $\data{D}^{-1/2}$ is a diagonal matrix with elements $(s_{X_{i}X_{i}})^{-1/2}$ on its main diagonal.

EXAMPLE 3.8   The empirical covariances are calculated for the pullover data set.

The vector of the means of the four variables in the dataset is $\overline{x}=(172.7,104.6,104.0,93.8)^{\top}$.

The sample covariance matrix is ${\data S}=\left(\begin{array}{rrrr}
1037.2&-80.2&1430.7&271.4\\
-80.2&219.8&9...
...-91.6\\
1430.7&92.1&2624&210.3\\
271.4&-91.6&210.3&177.4
\end{array}\right).$

The unbiased estimate of the variance ($n$ =10) is equal to

\begin{displaymath}{\data S_u}=\frac{10}{9}{\data{S}}=\left(\begin{array}{rrrr} ...
...3&2915.6&233.7\\
301.6&-101.8&233.7&197.1
\end{array}\right).\end{displaymath}

The sample correlation matrix is ${\data R}=\left(\begin{array}{llll}
\phantom{-}1& -0.17& \phantom{-}0.87& \pha...
...\
\phantom{-}0.63& -0.46& \phantom{-} 0.31& \phantom{-}1
\end{array}\right).$


Linear Transformation

In many practical applications we need to study linear transformations of the original data. This motivates the question of how to calculate summary statistics after such linear transformations.

Let $\data{A}$ be a ($q \times p$) matrix and consider the transformed data matrix

\begin{displaymath}
\data{Y} = \data{X}\data{A}^{\top} = (y_{1}, \ldots, y_{n})^{\top}.
\end{displaymath} (3.22)

The row $ y_{i}=(y_{i1},\dots,y_{iq}) \in \mathbb{R}^q$ can be viewed as the $i$-th observation of a $q$-dimensional random variable $Y=\data{A}X$. In fact we have $y_i = x_{i}\data{A}^{\top} $. We immediately obtain the mean and the empirical covariance of the variables (columns) forming the data matrix $\data{Y}$:

$\displaystyle \overline{y}$ $\textstyle =$ $\displaystyle \frac{1}{n}\data{Y}^{\top} 1_{n}
= \frac{1}{n} \data{A}\data{X}^{\top} 1_{n} =
\data{A} \overline{x}$ (3.23)
$\displaystyle \data{S}_{\data{Y}}$ $\textstyle =$ $\displaystyle \frac{1}{n} \data{Y}^{\top}\data{H}\data{Y}
= \frac{1}{n} \data{A...
...}\data{H}\data{X}\data{A}^{\top}
= \data{A} \data{S}_{\data{X}}\data{A}^{\top}.$ (3.24)

Note that if the linear transformation is nonhomogeneous, i.e.,

\begin{displaymath}y_{i} = \data{A}x_{i} + b \quad\quad \textrm{where} \quad b(q \times 1), \end{displaymath}

only (3.23) changes: $ \overline{y} = \data{A} \overline{x} + b $. The formula (3.23) and (3.24) are useful in the particular case of $q = 1$, i.e., $y=\data{X}a \Leftrightarrow
y_{i} = a^{\top}x_{i}; i=1,\ldots,n$:

\begin{eqnarray*}
\overline{y} &=& a^{\top}\overline{x}\\
\data{S}_{y} &=& a^{\top} \data{S}_{\data{X}}a.
\end{eqnarray*}



EXAMPLE 3.9   Suppose that $\data{X}$ is the pullover data set. The manager wants to compute his mean expenses for advertisement ($X_3$) and sales assistant ($X_4$).

Suppose that the sales assistant charges an hourly wage of 10 EUR. Then the shop manager calculates the expenses $Y$ as $Y = X_3 + 10 X_4$. Formula (3.22) says that this is equivalent to defining the matrix ${\data{A}}(4\times 1)$ as:

\begin{displaymath}
{\data{A}}=(0, 0, 1, 10).
\end{displaymath}

Using formulas (3.23) and (3.24), it is now computationally very easy to obtain the sample mean $\overline y$ and the sample variance $\data{S}_y$ of the overall expenses:

\begin{displaymath}
\overline y = {\data{A}} \overline x = (0, 0, 1, 10)
\left(...
...}{r}
172.7\\
104.6\\
104.0\\
93.8
\end{array}\right)=1042.0
\end{displaymath}


\begin{displaymath}
{\data{S}}_{\data{Y}}=
{\data{A}}{\data{S}}_{\data{X}}{\data...
...
\left(
\begin{array}{r}
0\\
0\\
1\\
10
\end{array}\right)
\end{displaymath}


\begin{displaymath}
=2915.6+4674+19710=27299.6.
\end{displaymath}


Mahalanobis Transformation

A special case of this linear transformation is
\begin{displaymath}
z_i=\data{S}^{-1/2}(x_i-\overline x), \quad i=1,\ldots ,n.
\end{displaymath} (3.25)

Note that for the transformed data matrix $\data{Z} = (z_1,\ldots ,z_n)^{\top}$,
\begin{displaymath}
\data{S}_{\data{Z}}=n^{-1}\data{Z}^{\top}\data{H}\data{Z}=\data{I}_{p}.
\end{displaymath} (3.26)

So the Mahalanobis transformation eliminates the correlation between the variables and standardizes the variance of each variable. If we apply (3.24) using $\data{A} = \data{S}^{-1/2}$, we obtain the identity covariance matrix as indicated in (3.26).

Summary
$\ast$
The center of gravity of a data matrix is given by its mean vector $\overline x = n^{-1} \data{X}^{\top} 1_{n}$.
$\ast$
The dispersion of the observations in a data matrix is given by the empirical covariance matrix $\data{S} = n^{-1}\data{X}^{\top}\data{H}\data{X}
$.
$\ast$
The empirical correlation matrix is given by $\data{R}=\data{D}^{-1/2}\data{S}\data{D}^{-1/2}$.
$\ast$
A linear transformation $\data{Y}=\data{X}\data{A}^{\top}$ of a data matrix $\data{X}$ has mean $\data{A}\overline{x}$ and empirical covariance $\data{A}\data{S}_{\data{X}}\data{A}^{\top}$.
$\ast$
The Mahalanobis transformation is a linear transformation $z_{i}=
\data{S}^{-1/2}(x_{i}-\overline{x})$ which gives a standardized, uncorrelated data matrix $\data{Z}$.