4.5 Conditional Probabilities and Expectations

The conditional probability that a random variable $ Y$ takes values between $ a$ and $ b$ conditioned on the event that a random variable $ X$ takes values between $ x$ and $ x+\Delta_x$, is defined as

$\displaystyle \P(a \le Y \le b \vert x \le X \le x+\Delta_x ) = \frac{\P(a \le Y \le b \, , \, x \le X \le x+\Delta_x)}{\P(x \le X \le x+\Delta_x) } \, ,$ (4.4)

provided the denominator is different from zero. The conditional probability of events of the kind $ a \le Y \le b$ reflects our opinion of which values are more plausible than others, given that another random variable $ X$ has taken a certain value. If $ Y$ is independent of $ X$, the probabilities of $ Y$ are not influenced by a priori knowledge about $ X$. It holds:

$\displaystyle \P(a \le Y \le b \vert x \le X \le x+\Delta x ) = \P(a \le Y \le b) \, .$

As $ \Delta x$ goes to 0 in equation (3.4), the left side of equation (3.4) converges heuristically to $ \P(a \le Y \le b \vert X = x)$. In the case of a continuous random variable $ X$ having a density $ p_X$, the left side of equation (3.4) is not defined since $ \P(X=x) = 0$ for all $ x$. But, it is possible to give a sound mathematical definition of the conditional distribution of $ Y$ given $ X=x$. If the random variables $ Y$ and $ X$ have a joint distribution $ p(x,y)$, then the conditional distribution has the density

$\displaystyle p_{Y\vert X} ( y \vert x) = \frac{p(x,y)}{p_X(x)} \,$   for$\displaystyle \quad p_X(x) \ne 0$

and $ p_{Y\vert X} ( y \vert x) = 0$ otherwise. Consequently, it holds

$\displaystyle \P(a \le Y \le b \vert X = x) = \int^b_a \, p_{Y\vert X} (y\vert x) dy .$

The expectation with respect to the conditional distribution can be computed by

$\displaystyle \mathop{\text{\rm\sf E}}( Y \vert X = x) = \int^\infty_{-\infty} \, y \, p_{Y\vert X} (y\vert x) dy \stackrel{\mathrm{def}}{=}\eta (x) .$

The function $ \eta (x) = \mathop{\text{\rm\sf E}}( Y \vert X = x)$ is called the conditional expectation of $ Y$ given $ X=x$. Intuitively, it is the expectation of the random variable $ Y$ knowing that $ X$ has taken the value $ x.$

Considering $ \eta (x)$ as a function of the random variable $ X$ the conditional expectation of $ Y$ given $ X$ is obtained:

$\displaystyle \mathop{\text{\rm\sf E}}( Y \vert X ) = \eta(X) .$

$ \mathop{\text{\rm\sf E}}( Y \vert X )$ is a random variable, which can be regarded as a function having the same expectation as $ Y$. The conditional expectation has some useful properties, which we summarize in the following theorem.

Theorem 4.1   Let $ X, Y, Z$ be real valued continuous random variables having a joint density.
a)
If $ X, Y$ are independent, then $ \mathop{\text{\rm\sf E}}( Y \vert X=x ) = \mathop{\text{\rm\sf E}}(Y)$
b)
If $ Y = g(X)$ is a function of $ X$, then

$\displaystyle \mathop{\text{\rm\sf E}}[ Y \vert X=x ] = \mathop{\text{\rm\sf E}}[ g(X) \vert X=x] = g(x) .$

In general, it holds for random variables of the kind $ Y = Z
g(X):$

$\displaystyle \mathop{\text{\rm\sf E}}[ Y \vert X=x ] = \mathop{\text{\rm\sf E}}[ Z g(X) \vert X=x] = g(x) \mathop{\text{\rm\sf E}}[ Z \vert X=x] .$

c)
The conditional expectation is linear, i.e. for any real numbers $ a, b$ it holds:

$\displaystyle \mathop{\text{\rm\sf E}}( aY + bZ \vert X=x) = a \mathop{\text{\rm\sf E}}(Y \vert X=x) + b \mathop{\text{\rm\sf E}}(Z \vert X=x) .$

d)
The law of iterated expectations: $ \mathop{\text{\rm\sf E}}[ \mathop{\text{\rm\sf E}}( Y \vert X ) ] =
\mathop{\text{\rm\sf E}}(Y) $.

The concept of the conditional expectation can be generalized analogously for multivariate random vectors $ Y$ and $ X.$ Let $ S_t,
t=0, 1, 2, ...$ be a sequence of chronologically ordered random variables, for instance as a model of daily stock prices, let $ Y=S_{t+1}$ and $ X=(S_t, ..., S_{t-p+1})^\top $, then the conditional expectation

$\displaystyle \mathop{\text{\rm\sf E}}(Y \vert X=x) = \mathop{\text{\rm\sf E}}(S_{t+1} \vert S_t =x_1, ..., S_{t-p+1}=x_p)$

represents the expected stock price of the following day $ t+1$ given the stock prices $ x = (x_1, ..., x_p)^\top $ of the previous $ p$ days. Since the information available at time $ t$ (relevant for the future evolution of the stock price) can consist of more than only a few past stock prices, we make frequent use of the notation $ \mathop{\text{\rm\sf E}}(Y \vert {{\cal F}} _t )$ for the expectation of $ Y$ given the information available up to time $ t$. For all $ t$, $ {{\cal F}} _t$ denotes a family of events (having the structure of a $ \sigma$-algebra, i.e. certain combinations of events of $ {{\cal F}} _t$ are again elements of $ {{\cal F}} _t$) representing the information available up to time $ t.$ $ {{\cal F}} _t$ consists of events of which it is known whether they occur up to time $ t$ or not. Since more information unveils as time evolves, we must have $ {{\cal F}} _s \subset {{\cal F}} _t$ for $ s
< t$, see Definition 5.1. Leaving out the exact definition of $ \mathop{\text{\rm\sf E}}(Y \vert {{\cal F}} _t )$ we confine to emphasize that the computation rules given in Theorem 3.1, appropriately reformulated, can be applied to the general conditional expectation.