6.2 The Cramer-Rao Lower Bound
As pointed out above, an important question in estimation theory is whether an
has certain desired properties,
in particular, if it
converges to the unknown parameter it is supposed to
estimate. One typical property we want for an estimator is unbiasedness,
meaning that on the average, the estimator hits its target:
. We have seen for instance (see Example 6.2) that
is an unbiased estimator of and is a biased
estimator of in finite samples.
If we restrict ourselves to unbiased estimation then the natural question is
whether the estimator shares some optimality properties in terms of its
sampling variance. Since
we focus on unbiasedness, we look for an estimator with the smallest possible
In this context, the Cramer-Rao lower bound will give the minimal achievable
variance for any unbiased estimator. This result is valid under very general
regularity conditions (discussed below). One of the most important
applications of the Cramer-Rao lower bound is that it provides
the asymptotic optimality property of maximum likelihood
estimators. The Cramer-Rao theorem involves the score function and its
properties which will be derived first.
The score function
is the derivative of the
log likelihood function w.r.t.
The covariance matrix
is called the
Fisher information matrix.
In what follows, we will give some interesting properties of score functions.
The proof is left as an exercise (see Exercise 6.9).
The regularity conditions required for this theorem
are rather technical and ensure that the expressions
(expectations and derivations) appearing in (6.10) are well defined.
In particular, the support of the density should not
depend on . The next corollary is a direct consequence.
is the score function and if
is any function of
under regularity conditions
is the score
is any unbiased estimator of
Note that the score function has mean zero
(see Exercise 6.10).
and by setting
in Theorem 6.1 it follows that
the Fisher information matrix for sample size n=1.
Consider an i.i.d. sample
In this case the parameter
is the mean
It follows from (6.3
Hence, the information matrix
How well can we estimate ? The answer is given in the
following theorem which is due to Cramer and Rao. As pointed out above, this
theorem gives a lower bound for unbiased estimators. Hence, all estimators,
which are unbiased and attain this lower bound,
are minimum variance estimators.
is any unbiased estimator
, then under regularity conditions
is the Fisher information matrix
Consider the correlation between and
Here is the score and the vectors ,
By Corollary 6.1
In particular, this holds for any . Therefore it holds also for
the maximum of the left-hand side of (6.15) with respect to . Since
by our maximization Theorem 2.5 we have
which is equivalent to
Maximum likelihood estimators (MLE's) attain the lower bound if the
sample size goes to infinity. The next Theorem 6.3
states this and, in addition,
gives the asymptotic sampling distribution
of the maximum likelihood estimation,
which turns out to be multinormal.
As a consequence of Theorem 6.3 we see that under regularity
conditions the MLE is asymptotically unbiased, efficient (minimum variance)
and normally distributed.
Also it is a consistent estimator of .
Suppose that the sample
is the MLE for
then under some regularity conditions, as
denotes the Fisher information
for sample size
Note that from property (5.4) of the multinormal it follows
is a consistent estimator of
, we have equivalently
This expression is sometimes useful in testing hypotheses about
and in constructing confidence regions for in a very general setup.
These issues will be raised in more details in the next chapter but
from (6.18) it can be seen, for instance, that when is large,
denotes the -quantile of a
random variable. So, the ellipsoid
provides in an asymptotic -confidence region for .
The score function is the derivative
of the log-likelihood with
respect to . The covariance matrix of
the Fisher information matrix.
The score function has mean zero:
The Cramer-Rao bound says that any unbiased estimator
has a variance that is bounded
from below by the inverse of the Fisher information. Thus, an unbiased
estimator, which attains this lower bound, is a minimum variance estimator.
For i.i.d. data
the Fisher information matrix is:
MLE's attain the lower bound in an asymptotic sense, i.e.,
is the MLE for