4.4 Random Vectors, Dependence, Correlation

A random vector ( $ X_1,\ldots,X_n$) from $ \mathbb{R}^n$ can be useful in describing the mutual dependencies of several random variables $ X_1,\ldots,X_n$, for example several underlying stocks. The joint distribution of the random variables $ X_1,\ldots,X_n$ is as in the univariate case, uniquely determined by the probabilities

$\displaystyle \P(a_1 \le X_1 \le b_1, \ldots, a_n \le X_n \le b_n) \, \, , \quad - \infty < a_i \le b_i < \infty \, , \, i=1,...,n \, .$

If the random vector ( $ X_1,\ldots,X_n$) has a density $ p(x_1,
\ldots, x_n)$, the probabilities can be computed by means of the following integrals:

$\displaystyle \P(a_1 \le X_1 \le b_1, \ldots, a_n \le X_n \le b_n) =
\int^{b_n}_{a_n} \ldots \int^{b_1}_{a_1} p(x_1, \ldots, x_n)
dx_1\ldots dx_n .$

The univariate or marginal distribution of $ X_j$ can be computed from the joint density by integrating out the variable not of interest.

$\displaystyle \P(a_j \le X_j
\le b_j)= \int^{\infty}_{-\infty}\ldots\int^{b_j}_{a_j} \ldots
\int^{\infty}_{-\infty} p(x_1, \ldots, x_n) dx_1\ldots dx_n.
$

The intuitive notion of independence of two random variables $ X_1, X_2$ is formalized by requiring:

$\displaystyle \P(a_1 \le X_1 \le b_1, \, a_2 \le X_2 \le b_2) = \P(a_1 \le X_1 \le b_1) \, \cdot
\, \P(a_2 \le X_2 \le b_2),$

i.e. the joint probability of two events depending on the random vector ($ X_1, X_2$) can be factorized. It is sufficient to consider the univariate distributions of $ X_1$ and $ X_2$ exclusively. If the random vector ($ X_1, X_2$) has a density $ p(x_1, x_2)$, then $ X_1$ and $ X_2$ have densities $ p_1(x)$ and $ p_2(x)$ as well. In this case, independence of both random variables is equivalent to a joint density which can be factorized:

$\displaystyle p(x_1, x_2) = p_1(x_1) p_2(x_2).$

Dependence of two random variables $ X_1, X_2$ can be very complicated. If $ X_1, X_2$ are jointly normally distributed, their dependency structure can be rather easily quantified by their covariance:

$\displaystyle \mathop{\text{\rm Cov}}(X_1, X_2) = \mathop{\text{\rm\sf E}}[(X_1 - \mathop{\text{\rm\sf E}}[X_1])(X_2 - \mathop{\text{\rm\sf E}}[X_2])],$

as well as by their correlation:

$\displaystyle \mathop{\text{\rm Corr}}(X_1, X_2) = {\displaystyle
\frac{\mathop{\text{\rm Cov}}(X_1, X_2)}{\sigma (X_1)\ \cdot \, \sigma (X_2) } \, }.$

The correlation has the advantage of taking values between -1 and +1, which is scale invariant. For jointly normally distributed random variables, independence is equivalent to zero correlation, while complete dependence is equivalent to either a correlation of +1 ($ X_1$ is large when $ X_2$ is large) or a correlation of -1 ($ X_1$ is large when $ X_2$ is small).

In general, it holds for independent random variables $ X_1,\ldots,X_n$

$\displaystyle \mathop{\text{\rm Cov}}(X_i, X_j) = 0 \qquad \text{\rm for\ } \, i \not= j \, .$

This implies a useful computation rule:

$\displaystyle \mathop{\text{\rm Var}}\big( \sum^n_{j=1} \, X_j \big) = \sum^n_{j=1} \, \mathop{\text{\rm Var}}(X_j) \, . $

If $ X_1,\ldots,X_n$ are independent and have all the same distribution:

$\displaystyle \P(a \le X_i \le b ) = \P(a \le X_j \le b)$   for all $\displaystyle \, i, j
\, , $

we call them independently and identically distributed (i.i.d.).