4.2 Moments and Characteristic Functions
Moments--Expectation and Covariance Matrix
If
is a random vector with density
then the
expectation of
is
![\begin{displaymath}
EX = \left ( \begin{array}{c} EX_1\\ \vdots\\ EX_p \end{arra...
...de x)d\undertilde x \end{array} \right )
= \undertilde \mu.
\end{displaymath}](mvahtmlimg1173.gif) |
(4.10) |
Accordingly, the expectation of a matrix of random elements has to be understood
component by component.
The operation of forming expectations is linear:
![\begin{displaymath}
E\left (\alpha X+\beta Y \right ) = \alpha EX +\beta EY.
\end{displaymath}](mvahtmlimg1174.gif) |
(4.11) |
If
is a matrix of real numbers, we have:
![\begin{displaymath}
E(AX) = A EX.
\end{displaymath}](mvahtmlimg1176.gif) |
(4.12) |
When
and
are independent,
![\begin{displaymath}
E(XY^{\top}) = EX EY^{\top}.
\end{displaymath}](mvahtmlimg1177.gif) |
(4.13) |
The matrix
![\begin{displaymath}
\Var(X) = \Sigma =E(X-\mu )(X-\mu )^{\top}
\end{displaymath}](mvahtmlimg1178.gif) |
(4.14) |
is the (theoretical) covariance matrix.
We write for a vector
with mean vector
and covariance matrix
,
![\begin{displaymath}
X\sim (\mu ,\Sigma ).
\end{displaymath}](mvahtmlimg1179.gif) |
(4.15) |
The
matrix
![\begin{displaymath}
\Sigma_{XY} = \Cov(X,Y)=E(X-\mu )(Y-\nu )^{\top}
\end{displaymath}](mvahtmlimg1181.gif) |
(4.16) |
is the covariance matrix of
and
. Note that
and
that
has covariance
. From
![\begin{displaymath}
\Cov(X,Y) = E(XY^{\top}) - \mu\nu^{\top}=E(XY^{\top}) - EX EY^{\top}
\end{displaymath}](mvahtmlimg1187.gif) |
(4.17) |
it follows that
in the case where
and
are independent.
We often say that
is the first order moment of
and that
provides the second order moments of
:
![\begin{displaymath}
E(XX^{\top}) = \{ E(X_iX_j) \}, \textrm{ for } i=1,\ldots,p \textrm{ and } j=1,\ldots,p.
\end{displaymath}](mvahtmlimg1191.gif) |
(4.18) |
Properties of the Covariance Matrix
|
|
![$\displaystyle \mathop{\mathit{Var}}(a^{\top}X) = a^{\top}\!\Var(X)a = \sum_{i,j} a_ia_j \sigma_{X_{i}X_{j}}$](mvahtmlimg1196.gif) |
(4.22) |
|
|
![$\displaystyle \Var(\data{A}X + b) = \data{A} \Var(X) \data{A}^{\top}$](mvahtmlimg1197.gif) |
(4.23) |
|
|
![$\displaystyle \Cov(X + Y,Z) = \Cov(X,Z) + \Cov(Y,Z)$](mvahtmlimg1198.gif) |
(4.24) |
|
|
![$\displaystyle \Var(X + Y) = \Var(X) + \Cov(X,Y) + \Cov(Y,X) + \Var(Y)$](mvahtmlimg1199.gif) |
(4.25) |
|
|
![$\displaystyle \Cov(\data{A}X,\data{B}Y) = \data{A} \Cov(X,Y) \data{B}^{\top}.$](mvahtmlimg1200.gif) |
(4.26) |
Let us compute these quantities for a specific joint density.
EXAMPLE 4.5
Consider the pdf of Example
4.1.
The mean vector
![$\mu ={\mu _1\choose\mu _2}$](mvahtmlimg1201.gif)
is
The elements of the covariance matrix are
Hence the covariance matrix is
Conditional Expectations
The conditional expectations are
![\begin{displaymath}
E(X_2\mid x_1) = \int x_2f(x_2\mid x_1)\;dx_2
\quad\textrm{ and }\quad
E(X_1\mid x_2) = \int x_1f(x_1\mid x_2)\;dx_1.
\end{displaymath}](mvahtmlimg1205.gif) |
(4.27) |
represents the location
parameter of the conditional pdf of
given that
. In the same
way, we can define
as a measure of the dispersion of
given that
. We have from (4.20) that
Using the conditional covariance matrix, the
conditional correlations may be defined as:
These conditional correlations are known as partial
correlations between
and
, conditioned on
being equal to
.
EXAMPLE 4.6
Consider the following pdf
Note that the pdf is symmetric in
![$x_1,x_2$](mvahtmlimg1212.gif)
and
![$x_3$](mvahtmlimg1213.gif)
which facilitates the computations.
For instance,
and the other marginals are similar. We also have
It is easy to compute the following moments:
Note that the conditional means of
and of
, given
, are not linear in
.
From these moments we obtain:
The conditional covariance matrix of
![$X_1$](mvahtmlimg14.gif)
and
![$X_2$](mvahtmlimg13.gif)
,
given
![$X_3=x_3$](mvahtmlimg1217.gif)
is
In particular, the partial correlation between
![$X_1$](mvahtmlimg14.gif)
and
![$X_2$](mvahtmlimg13.gif)
, given
that
![$X_3$](mvahtmlimg221.gif)
is fixed at
![$x_3$](mvahtmlimg1213.gif)
,
is given by
![$\rho _{X_1X_2\vert X_3=x_3}=-\frac{1}{12x_3^2+24x_3+11}$](mvahtmlimg1220.gif)
which ranges
from
![$-0.0909$](mvahtmlimg1221.gif)
to
![$-0.0213$](mvahtmlimg1222.gif)
when
![$x_3$](mvahtmlimg1213.gif)
goes from 0 to 1. Therefore, in this example,
the partial correlation may be larger or smaller than the simple correlation, depending on the value
of the condition
![$X_3=x_3$](mvahtmlimg1217.gif)
.
EXAMPLE 4.7
Consider the following joint pdf
Note the symmetry of
![$x_1$](mvahtmlimg1210.gif)
and
![$x_3$](mvahtmlimg1213.gif)
in the pdf and that
![$X_2$](mvahtmlimg13.gif)
is independent
of
![$(X_1,X_3)$](mvahtmlimg1224.gif)
. It immediately follows that
Simple computations lead to
Let us analyze the conditional distribution of
![$(X_1,X_2)$](mvahtmlimg1228.gif)
given
![$X_3=x_3$](mvahtmlimg1217.gif)
.
We have
so that again
![$X_1$](mvahtmlimg14.gif)
and
![$X_2$](mvahtmlimg13.gif)
are independent conditional on
![$X_3=x_3$](mvahtmlimg1217.gif)
.
In this case
Since
is a function of
, say
, we can define
the random variable
. The same can be done when defining
the random variable
. These two
random variables share some interesting properties:
EXAMPLE 4.8
Consider the following pdf
It is easy to show that
Without explicitly computing
![$f(x_2)$](mvahtmlimg1242.gif)
, we can obtain:
The conditional expectation
viewed as a function
of
(known as the regression function of
on
), can be interpreted
as a conditional approximation of
by a function of
. The error term
of the approximation is then given by:
Characteristic Functions
The characteristic function (cf) of a random vector
(respectively its density
) is defined as
where
is the complex unit:
.
The cf has the following properties:
![\begin{displaymath}
\varphi_X(0) = 1\ \textrm{ and }\ \vert\varphi_X(t)\vert \le 1.
\end{displaymath}](mvahtmlimg1255.gif) |
(4.30) |
If
is absolutely integrable, i.e., the integral
exists and is finite, then
![\begin{displaymath}
f(x) = \frac{1}{(2\pi)^p} \int^\infty_{-\infty}e^{-{\bf i}t^{\top}x}
\varphi_X(t)\;dt.
\end{displaymath}](mvahtmlimg1257.gif) |
(4.31) |
If
, then for
![\begin{displaymath}
\varphi_{X_1}(t_1) = \varphi_X(t_1,0,\ldots,0),\quad\ldots\quad,
\varphi_{X_p}(t_p) = \varphi_X(0,\ldots,0,t_{p}).\
\end{displaymath}](mvahtmlimg1260.gif) |
(4.32) |
If
are independent random variables, then
for
![\begin{displaymath}
\varphi_X(t) = \varphi_{X_1}(t_1)\cdotp\ldots\cdotp\varphi_{X_p}(t_p).
\end{displaymath}](mvahtmlimg1262.gif) |
(4.33) |
If
are independent random variables, then for
![\begin{displaymath}
\varphi_{X_{1}+\ldots+X_{p}}(t) = \varphi_{X_1}(t)\cdotp\ldots\cdotp\varphi_{X_p}(t).
\end{displaymath}](mvahtmlimg1264.gif) |
(4.34) |
The characteristic function can recover all the cross-product
moments of any order:
and for
we have
![\begin{displaymath}
E \left( X_1^{j_1}\cdotp\ldots\cdotp X_p^{j_p} \right) =
\fr...
...\partial t_1^{j_1} \ldots \partial t_p^{j_p} } \right]_{t=0}.
\end{displaymath}](mvahtmlimg1267.gif) |
(4.35) |
EXAMPLE 4.9
The cf of the density in example
4.5 is given by
EXAMPLE 4.10
Suppose
![$X\in\mathbb{R}^1$](mvahtmlimg1269.gif)
follows the density of the standard normal distribution
(see Section
4.4) then the cf can be computed via
since
![${\bf i}^2 = -1$](mvahtmlimg1254.gif)
and
![$\int \frac{1}{\sqrt{2\pi}} \exp \left\{-\frac
{(x-{\bf i}t)^2}{2}\right\}\,dx=1$](mvahtmlimg1272.gif)
.
A variety of distributional characteristics can be computed from
. The standard normal distribution has a very simple cf,
as was seen in Example 4.10. Deviations from normal covariance
structures can be measured by the deviations from the
cf (or characteristics of it). In Table 4.1 we give an
overview of the cf's for a variety of distributions.
Table 4.1:
Characteristic functions for some common distributions.
|
THEOREM 4.4 (Cramer-Wold
)
The distribution of
![$X \in \mathbb{R}^p$](mvahtmlimg36.gif)
is completely determined by the set of all
(one-dimensional) distributions of
![$t^{\top}X$](mvahtmlimg1285.gif)
where
![$t\in \mathbb{R}^p$](mvahtmlimg1286.gif)
.
This theorem says that we can determine the distribution of
in
by specifying all of the one-dimensional distributions of the
linear combinations
Cumulant functions
Moments
often help in describing distributional
characteristics. The normal distribution in
dimension is completely
characterized by its standard normal density
and the
moment parameters are
and
. Another helpful class of parameters are the cumulants or
semi-invariants of a distribution. In order to simplify notation we
concentrate here on the one-dimensional (
) case.
For a given random variable
with density
and finite
moments of order
the characteristic function
has the derivative
The values
are called cumulants or semi-invariants since
does not change (for
) under a shift transformation
. The cumulants are natural parameters for dimension
reduction methods, in particular the Projection Pursuit method
(see Section 18.2).
The relationship between the first
moments
and the
cumulants is given by
![\begin{displaymath}
\kappa_k=(-1)^{k-1}\left\vert
\begin{array}{cccc}
m_1 & 1 &...
...y}{c}k-1\\ k-2\end{array}\right)m_1\\
\end{array}\right\vert.
\end{displaymath}](mvahtmlimg1299.gif) |
(4.36) |
EXAMPLE 4.11
Suppose that
![$k=1$](mvahtmlimg1300.gif)
, then formula (
4.36) above yields
For
![$k=2$](mvahtmlimg22.gif)
we obtain
For
![$k=3$](mvahtmlimg1303.gif)
we have to calculate
Calculating the determinant we have:
Similarly one calculates
![\begin{displaymath}
\kappa_4=m_4-4m_3m_1-3m_2^2+12m_2m_1^2-6m_1^4.
\end{displaymath}](mvahtmlimg1309.gif) |
(4.38) |
The same type of process is used to find the moments of the cumulants:
A very simple relationship can be observed between the semi-invariants
and the central moments
, where
as
defined before. In fact,
,
and
.
Skewness
and kurtosis
are defined as:
The skewness and kurtosis determine the shape of one-dimensional distributions.
The skewness of a normal distribution is 0 and the kurtosis equals 3.
The relation of these parameters to the cumulants is given by:
These relations will be used later in Section 18.2 on
Projection Pursuit to determine deviations from normality.
Summary
![$\ast$](mvahtmlimg108.gif)
-
The expectation of a random vector
is
,
the covariance matrix
. We denote
.
![$\ast$](mvahtmlimg108.gif)
-
Expectations are linear, i.e.,
. If
and
are
independent, then
.
![$\ast$](mvahtmlimg108.gif)
-
The covariance between two random vectors
and
is
. If
and
are independent, then
.
![$\ast$](mvahtmlimg108.gif)
-
The characteristic function (cf) of a random vector
is
.
![$\ast$](mvahtmlimg108.gif)
-
The distribution of a
-dimensional random variable
is
completely determined by all one-dimensional distributions of
where
(Theorem of Cramer-Wold).
![$\ast$](mvahtmlimg108.gif)
-
The conditional expectation
is the MSE best approximation
of
by a function of
.