4.1 Distribution and Density Function

Let $X=(X_1,X_2,\ldots ,X_p)^{\top}$ be a random vector. The cumulative distribution function (cdf) of $X$ is defined by

\begin{displaymath}F(\undertilde x) = P(X\le \undertilde x)=P(X_1\le x_1, X_2\le x_2,\ldots ,X_p\le
x_p). \end{displaymath}

For continuous $X$, there exists a nonnegative probability density function (pdf) $f$, such that
\begin{displaymath}
F(\undertilde x) = \int ^{\undertilde x}_{-\infty }
f (\undertilde u)d{\undertilde u}.
\end{displaymath} (4.1)

Note that

\begin{displaymath}\int ^{\infty}_{-\infty } f(\undertilde u)\,d{\undertilde u} =1.\end{displaymath}

Most of the integrals appearing below are multidimensional. For instance, $\int_{-\infty}^x f(u) du$ means $\int_{-\infty}^{x_p} \cdots \int_{-\infty}^{x_1} f(u_1,\ldots,u_p) du_1 \cdots du_p.$ Note also that the cdf $F$ is differentiable with

\begin{displaymath}f(x) = \frac{\partial^p F(x)}{\partial x_1 \cdots \partial x_p}.\end{displaymath}

For discrete $X$, the values of this random variable are concentrated on a countable or finite set of points $\{c_j\}_{j\in J}$, the probability of events of the form $\{X\in D\}$ can then be computed as

\begin{displaymath}P(X\in D)=\sum _{\{j:c_j\in D\}} P(X=c_j). \end{displaymath}

If we partition $X$ as $X = (X_1,X_2)^{\top}$ with $X_1\in \mathbb{R}^k$ and $X_2\in
\mathbb{R}^{p-k}$, then the function
\begin{displaymath}F_{X_{1}}(\undertilde x_1)=P(X_1\le \undertilde x_1)
=F(x_{11},\ldots ,x_{1k},\infty ,\ldots, \infty) \end{displaymath} (4.2)

is called the marginal cdf. $F=F(x)$ is called the joint cdf. For continuous $X$ the marginal pdf can be computed from the joint density by ``integrating out'' the variable not of interest.
\begin{displaymath}f_{X_{1}}(\undertilde x_1) = \int ^\infty _{-\infty }f
(\undertilde x_1,\undertilde x_2) d\undertilde x_2. \end{displaymath} (4.3)

The conditional pdf of $X_2$ given $X_1=x_1$ is given as
\begin{displaymath}f(\undertilde x_2\mid \undertilde x_1) = \frac{f(\undertilde
x_1,\undertilde x_2) }{f_{X_{1}}(\undertilde x_1)}\cdotp \end{displaymath} (4.4)

EXAMPLE 4.1   Consider the pdf

\begin{displaymath}f(x_1,x_2) = \left \{ \begin{array}{l@{\quad \quad}l}
\frac{1...
...e x_1, x_2\le 1,\\
0 & \textrm{otherwise.}
\end{array} \right.\end{displaymath}

$f(x_1,x_2)$ is a density since

\begin{displaymath}\int f(x_1,x_2) dx_1 dx_2 = \frac{1}{2} \left[
\frac{x^2_1}...
...
\frac{x^2_2}{2} \right]^1_0 = \frac{1}{4} + \frac{3}{4} = 1.\end{displaymath}

The marginal densities are

\begin{eqnarray*}
f_{X_{1}}(x_1) & = & \int f(x_1,x_2)dx_2 =
\int ^1_0\left (\f...
...ac{3 }{2}x_2\right )dx_1
= \frac{3 }{2 }x_2+\frac{1 }{4}\cdotp
\end{eqnarray*}



The conditional densities are therefore

\begin{displaymath}f(x_2\mid x_1) = \frac{\frac{1 }{2 }x_1+\frac{3 }{2}x_2 }
{\...
... }x_1+\frac{3 }{2}x_2 }
{\frac{3 }{2 }x_2+\frac{1 }{4} }\cdotp\end{displaymath}

Note that these conditional pdf's are nonlinear in $x_{1}$ and $x_{2}$ although the joint pdf has a simple (linear) structure.

Independence of two random variables is defined as follows.

DEFINITION 4.1   $X_1$ and $X_2$ are independent iff $f(x) = f(x_1,x_2) = f_{X_{1}}(x_1) f_{X_{2}}(x_2)$.

That is, $X_1$ and $X_2$ are independent if the conditional pdf's are equal to the marginal densities, i.e., $ f(x_{1} \mid x_{2}) = f_{X_{1}}(x_{1}) $ and $ f(x_{2} \mid x_{1}) = f_{X_{2}}(x_{2}) $. Independence can be interpreted as follows: knowing $X_2 = x_2$ does not change the probability assessments on $X_1$, and conversely.

1mm
\begin{picture}(2.00,2.00)
\par\linethickness{1.0pt}\put(0.00,0.00){\line(1,0){1...
...\line(1,-2){5.00}}
\put(5.00,4.00){\makebox(0,0)[cc]{\LARGE\bf !}}
\end{picture}
Different joint pdf's may have the same marginal pdf's.

EXAMPLE 4.2   Consider the pdf's

\begin{displaymath}f(x_1,x_2)=1, \quad 0<x_1,x_2<1, \end{displaymath}

and

\begin{displaymath}f(x_1,x_2)=1+\alpha (2x_1-1)(2x_2-1), \quad 0<x_1, \ x_2<1,
\quad -1\le \alpha \le 1. \end{displaymath}

We compute in both cases the marginal pdf's as

\begin{displaymath}f_{X_{1}}(x_1)=1, \quad f_{X_{2}}(x_2)=1. \end{displaymath}

Indeed

\begin{displaymath}\int ^1_01+\alpha (2x_1-1)(2x_2-1)dx_2
=1+\alpha(2x_1-1)[x^2_2-x_2]^1_0=1. \end{displaymath}

Hence we obtain identical marginals from different joint distributions!

Let us study the concept of independence using the bank notes example. Consider the variables $X_{4}$ (lower inner frame) and $X_{5}$ (upper inner frame). From Chapter 3, we already know that they have significant correlation, so they are almost surely not independent. Kernel estimates of the marginal densities, $\widehat f_{X_{4}}$ and $\widehat f_{X_{5}}$, are given in Figure 4.1. In Figure 4.2 (left) we show the product of these two densities. The kernel density technique was presented in Section 1.3. If $X_{4}$ and $X_{5}$ are independent, this product $\widehat f_{X_{4}}
\cdot \widehat f_{X_{5}}$ should be roughly equal to $\widehat
f(x_{4},x_{5})$, the estimate of the joint density of $(X_{4},X_{5})$. Comparing the two graphs in Figure 4.2 reveals that the two densities are different. The two variables $X_4$ and $X_5$ are therefore not independent.

Figure 4.1: Univariate estimates of the density of $X_4$ (left) and $X_{5}$ (right) of the bank notes. 13654 MVAdenbank2.xpl
\includegraphics[width=1.3\defpicwidth]{MVAdenbank2.ps}

Figure 4.2: The product of univariate density estimates (left) and the joint density estimate (right) for $X_4$ (left) and $X_{5}$ of the bank notes. 13658 MVAdenbank3.xpl
\includegraphics[width=1.3\defpicwidth]{MVAdenbank3.ps}

An elegant concept of connecting marginals with joint cdfs is given by copulas. Copulas are important in Value-at-Risk calculations and are an essential tool in quantitative finance (Härdle et al.; 2002).

For simplicity of presentation we concentrate on the $p=2$ dimensional case. A 2-dimensional copula is a function $C: \, [0,1]^2 \to [0,1]$ with the following properties:

The usage of the name ``copula'' for the function $C$ is explained by the following theorem.

THEOREM 4.1 (Sklar's theorem)   Let $F$ be a joint distribution function with marginal distribution functions $F_{X_1}$ and $F_{X_2}$. Then there exists a copula $C$ with
\begin{displaymath}
F(x_1,x_2) = C\{ F_{X_1}(x_1),F_{X_2}(x_2)\}
\end{displaymath} (4.5)

for every $x_1,x_2 \in \mathbb{R}$. If $F_{X_1}$ and $F_{X_2}$ are continuous, then $C$ is unique. On the other hand, if $C$ is a copula and $F_{X_1}$ and $F_{X_2}$ are distribution functions, then the function $F$ defined by (4.5) is a joint distribution function with marginals $F_{X_1}$ and $F_{X_2}$.

With Sklar's Theorem, the use of the name ``copula'' becomes obvious. It was chosen to describe ``a function that links a multidimensional distribution to its one-dimensional margins'' and appeared in the mathematical literature for the first time in Sklar (1959).

EXAMPLE 4.3   The structure of independence implies that the product of the distribution functions $F_{X_1}$ and $F_{X_2}$ equals their joint distribution function $F$,
\begin{displaymath}
F(x_1,x_2) = F_{X_1}(x_1) \cdot F_{X_2}(x_2).
\end{displaymath} (4.6)

Thus, we obtain the independence copula $C = \Pi$ from

\begin{displaymath}
\Pi(u_1,\dots,u_n)=\prod_{i=1}^n u_i \; .
\end{displaymath}

THEOREM 4.2   Let $X_1$ and $X_2$ be random variables with continuous distribution functions $F_{X_1}$ and $F_{X_2}$ and the joint distribution function $F$. Then $X_1$ and $X_2$ are independent if and only if $C_{X_1, X_2} = \Pi$.

PROOF:
From Sklar's Theorem we know that there exists an unique copula $C$ with
\begin{displaymath}
P (X_1 \le x_1, X_2 \le x_2) = F(x_1,x_2) =
C\{F_{X_1}(x_1),F_{X_2}(x_2)\} \, .
\end{displaymath} (4.7)

Independence can be seen using (4.5) for the joint distribution function $F$ and the definition of $\Pi$,
\begin{displaymath}
F(x_1,x_2) = C\{F_{X_1}(x_1),F_{X_2}(x_2)\} = F_{X_1}(x_1) F_{X_2}(x_2) \; .
\end{displaymath} (4.8)

${\Box}$

EXAMPLE 4.4  

The Gumbel-Hougaard family of copulas (Nelsen; 1999) is given by the function

\begin{displaymath}
C_{\theta}(u, v) = \exp \left\{ - \left[ (-\ln u)^{\theta}
+ (-\ln v)^{\theta} \right]^{1 / \theta} \right\} \; .
\end{displaymath} (4.9)

The parameter $\theta$ may take all values in the interval $[1,\infty)$. The Gumbel-Hougaard copulas are suited to describe bivariate extreme value distributions.

For $\theta = 1$, the expression (4.9) reduces to the product copula, i.e., $C_1(u,v) = \Pi(u,v) = u \, v$. For $\theta \to \infty$ one finds for the Gumbel-Hougaard copula:

\begin{displaymath}C_{\theta}(u,v) {\longrightarrow}
\min(u,v) = M(u,v),\end{displaymath}

where the function $M$ is also a copula such that $C(u,v) \le M(u,v)$ for arbitrary copula $C$. The copula $M$ is called the Fréchet-Hoeffding upper bound.

Similarly, we obtain the Fréchet-Hoeffding lower bound $W(u,v) = \max(u+v-1,0)$ which satisfies $W(u,v) \le C(u,v)$ for any other copula $C$.

Summary
$\ast$
The cumulative distribution function (cdf) is defined as $F(x)=P(X<x)$.
$\ast$
If a probability density function (pdf) $f$ exists then $ F(x) = \int_{-\infty}^x f(u)du $.
$\ast$
The pdf integrates to one, i.e., $\int_{-\infty}^\infty f(x) dx =1$.
$\ast$
Let $X=(X_{1},X_{2})^{\top}$ be partitioned into sub-vectors $X_{1}$ and $X_{2}$ with joint cdf $F$. Then $ F_{X_{1}}(x_{1}) = P(X_{1} \le x_{1}) $ is the marginal cdf of $X_{1}$. The marginal pdf of $X_{1}$ is obtained by $ f_{X_{1}} (x_{1}) =
\int_{-\infty}^{\infty} f(x_{1},x_{2}) dx_{2} $. Different joint pdf's may have the same marginal pdf's.
$\ast$
The conditional pdf of $X_{2}$ given $X_{1}=x_{1}$ is defined as $ f(\undertilde x_2\mid \undertilde x_1) = \frac{\displaystyle f(\undertilde
x_1,\undertilde x_2) }{\displaystyle f_{X_{1}}(\undertilde x_1)}\cdotp $
$\ast$
Two random variables $X_{1}$ and $X_{2}$ are called independent iff
$f(x_1,x_2) = f_{X_{1}}(x_1) f_{X_{2}}(x_2)$. This is equivalent to $ f(x_{2} \mid x_{1}) = f_{X_{2}}(x_{2}) $.
$\ast$
Different joint pdf's may have identical marginal pdf's.