4.1 Real Valued Random Variables

Thanks to Newton's laws, dropping a stone from a latitude of 10 m, the point of time of its impact on the ground is known before executing the experiment. Quantities in complex systems (such as stock prices at a certain date, daily maximum temperature at a certain place) are however not deterministically predictable, although it is known which values are more likely to occur than others. Contrary to the falling stone, data which cannot be described successfully by deterministic mechanism, can be modelled by random variables.

Let $ X$ be such a random variable which, (as a model for stock prices), takes values on the real time. The appraisal of which values of $ X$ are more and which are less likely, is expressed by the probability of events as $ \{ a < X <
b \} $ or $ \{ X \le b \}$. The set of all probabilities

$\displaystyle \P(a \le X \le b) \, \, , \quad - \infty < a \le b < \infty \, \, , $

determines the distribution of $ X$. In other words, the distribution is defined by the probabilities of all events which depend on $ X$. In the following, we denote the probability distribution of $ X$ by $ {\cal L} (X)$.

The probability distribution is uniquely defined by the cumulative probability distribution

$\displaystyle F(x) = \P(X \le x) \, \, , \quad - \infty < x < \infty \, \, .$

$ F(x)$ is monotonously increasing and converges for $ x\rightarrow
- \infty$ to 0, and for $ x\rightarrow \infty$ to $ 1$. If there is a function $ p$, such that the probabilities can be computed by means of an integral

$\displaystyle \P(a < X < b) = \int^b_a \, p(x) dx \, ,$

$ p$ is called a probability density, or briefly density of $ X$. Then the cumulative distribution function is a primitive of $ p$:

$\displaystyle F(x) = \int^x_{- \infty} \, p(y) dy . $

For small $ h$ it holds:

$\displaystyle \P(x-h < X < x + h ) \approx 2 h \cdot p (x) .$

Thus $ p(x)$ is a measure of the likelihood that $ X$ takes values close to $ x$.

The most important family of distributions with densities, is the normal distribution family. It is characterized by two parameters $ \mu, \sigma^2 $. The densities are given by

$\displaystyle \varphi_{\mu, \sigma^2}(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} \, e...
...\sigma^2}} = \frac{1}{\sigma} \, \varphi
\big( \frac{x-\mu}{\sigma} \big) \, , $

$\displaystyle \varphi (x) = \varphi _{0,1} (x) = \frac{1}{\sqrt{2\pi}} \, e^{-
\frac{x^2}{2}} \, .$

The distribution with density $ \varphi (x)$ is called standard normal distribution. ``$ X$ is a normal random variable with parameters $ \mu, \sigma^2 $'' is commonly abbreviated by ``$ X$ is $ N (\mu,
\sigma^2)$ distributed''. The cumulative distribution function of a standard normal distribution is denoted by $ \Phi $ and it holds:

$\displaystyle \Phi(x) = \int^x_{- \infty} \, \varphi(y) dy .$

If $ X$ is $ N (\mu,
\sigma^2)$ distributed, then the centered and scaled random variable $ (X - \mu)/\sigma$ is standard normal distributed. Therefore, its cumulative distribution function is given by:

$\displaystyle F(x) = \P(X \le x) = \P\big( \frac{X - \mu}{\sigma} \le \frac{x - \mu}{\sigma} \big) = \Phi \big( \frac{x - \mu}{\sigma} \big) .$

A distribution which is of importance in modelling stock prices is the lognormal distribution. Let $ X$ be a positive random variable whose natural logarithm $ \ln(X)$ is normally distributed with parameters $ \mu, \sigma^2 $. We say that $ X$ is lognormally distributed with parameters $ \mu, \sigma^2 $. Its cumulative distribution function follows directly from the above definition:

$\displaystyle F(x) = \P(X \le x) = \P(\ln X \le \ln x) = \Phi \big(\frac{\ln x - \mu}{\sigma} \big) , \quad x > 0.$

Deriving $ F(x)$ once, we obtain its density function with parameters $ \mu, \sigma^2 $:

$\displaystyle p(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} \, \frac{1}{x} \, e^{- \fr...
...ac{1}{\sigma x} \, \varphi \big( \frac{\ln x-\mu}{\sigma}
\big) , \quad x > 0. $

If $ X$ is a random variable that takes only finitely or countably infinite values $ x_1, \ldots, x_n$, $ X$ is said to be a discrete random variable and its distribution is fully determined by the probabilities:

$\displaystyle \P(X = x_k) \, \, , \quad k = 1, \ldots, n .$

The simplest discrete random variables take only 2 or 3 values, for example $ \pm 1 $ or $ - 1, 0, + 1 . $ They constitute the basis of binomial or trinomial trees which can be used to construct discrete random processes in computers. Such tree methods are reasonable approximations to continuous processes which are used to model stock prices.

In this context, binomially distributed random variables appear. Let $ Y_1, \ldots, Y_n$ be independent random variables taking two values, 0 or 1, with probabilities

$\displaystyle p = \P(Y_k = 1) \, \, , \, \, 1-p = \P(Y_k = 0) \, \, , \quad k = 1,
\ldots, n \, . $

We call such random variables Bernoulli distributed with parameter $ p$. The number of ones appearing in the sample $ Y_1,
\ldots, Y_n \, , $ equals the sum $ X=\sum_{k=1}^{n} Y_k$ which is binomial distributed with parameters $ n, p:$

$\displaystyle X = \sum^n_{k=1} Y_k \, \, , \, \, \P(X = m) = {n \choose m} p^m
(1-p)^{n-m} \, , \quad m = 0, \ldots, n \, . $

4881 SFEBinomial.xpl

Instead of saying $ X$ is binomial distributed with parameters $ n,
p$, we use the notation ``$ X$ is $ B (n,p)$ distributed''. Hence, a Bernoulli distributed random variable is $ B (1,p)$ distributed.

If $ n$ is large enough, a $ B (n,p)$ distributed random variable can be approximated by a $ N (np, \, np(1-p))$ distributed random variable $ Z$, in the sense that

$\displaystyle \P(a < X < b) \approx \P(a < Z < b) \, .$ (4.1)

The central limit theorem is more precise on that matter. In classical statistics it is used to avoid, for large $ n$, the tedious calculation of binomial probabilities. Conversely, it is possible to approximate the normal distribution by an easy simulated binomial tree. 4884 SFEclt.xpl