6.2 The Cramer-Rao Lower Bound
As pointed out above, an important question in estimation theory is whether an
estimator
has certain desired properties,
in particular, if it
converges to the unknown parameter it is supposed to
estimate. One typical property we want for an estimator is unbiasedness,
meaning that on the average, the estimator hits its target:
. We have seen for instance (see Example 6.2) that
is an unbiased estimator of and is a biased
estimator of in finite samples.
If we restrict ourselves to unbiased estimation then the natural question is
whether the estimator shares some optimality properties in terms of its
sampling variance. Since
we focus on unbiasedness, we look for an estimator with the smallest possible
variance.
In this context, the Cramer-Rao lower bound will give the minimal achievable
variance for any unbiased estimator. This result is valid under very general
regularity conditions (discussed below). One of the most important
applications of the Cramer-Rao lower bound is that it provides
the asymptotic optimality property of maximum likelihood
estimators. The Cramer-Rao theorem involves the score function and its
properties which will be derived first.
The score function
is the derivative of the
log likelihood function w.r.t.
|
(6.9) |
The covariance matrix
is called the
Fisher information matrix.
In what follows, we will give some interesting properties of score functions.
THEOREM 6.1
If
is the score function and if
is any function of
and
, then
under regularity conditions
|
(6.10) |
The proof is left as an exercise (see Exercise 6.9).
The regularity conditions required for this theorem
are rather technical and ensure that the expressions
(expectations and derivations) appearing in (6.10) are well defined.
In particular, the support of the density should not
depend on . The next corollary is a direct consequence.
COROLLARY 6.1
If
is the score
function, and
is any unbiased estimator of
(i.e.,
), then
|
(6.11) |
Note that the score function has mean zero
(see Exercise 6.10).
|
(6.12) |
Hence,
and by setting
in Theorem 6.1 it follows that
REMARK 6.1
If
are i.i.d.,
where
is
the Fisher information matrix for sample size n=1.
EXAMPLE 6.4
Consider an i.i.d. sample
from
.
In this case the parameter
is the mean
.
It follows from (
6.3) that:
Hence, the information matrix
is
How well can we estimate ? The answer is given in the
following theorem which is due to Cramer and Rao. As pointed out above, this
theorem gives a lower bound for unbiased estimators. Hence, all estimators,
which are unbiased and attain this lower bound,
are minimum variance estimators.
THEOREM 6.2 (Cramer-Rao
)
If
is any unbiased estimator
for
, then under regularity conditions
|
(6.13) |
where
|
(6.14) |
is the Fisher information matrix
.
PROOF:
Consider the correlation between and
where ,
.
Here is the score and the vectors ,
.
By Corollary 6.1
and thus
Hence,
|
(6.15) |
In particular, this holds for any . Therefore it holds also for
the maximum of the left-hand side of (6.15) with respect to . Since
and
by our maximization Theorem 2.5 we have
i.e.,
which is equivalent to
.
Maximum likelihood estimators (MLE's) attain the lower bound if the
sample size goes to infinity. The next Theorem 6.3
states this and, in addition,
gives the asymptotic sampling distribution
of the maximum likelihood estimation,
which turns out to be multinormal.
THEOREM 6.3
Suppose that the sample
is i.i.d.
If
is the MLE for
, i.e.,
,
then under some regularity conditions, as
:
|
(6.16) |
where
denotes the Fisher information
for sample size
.
As a consequence of Theorem 6.3 we see that under regularity
conditions the MLE is asymptotically unbiased, efficient (minimum variance)
and normally distributed.
Also it is a consistent estimator of .
Note that from property (5.4) of the multinormal it follows
that asymptotically
|
(6.17) |
If
is a consistent estimator of
, we have equivalently
|
(6.18) |
This expression is sometimes useful in testing hypotheses about
and in constructing confidence regions for in a very general setup.
These issues will be raised in more details in the next chapter but
from (6.18) it can be seen, for instance, that when is large,
where
denotes the -quantile of a
random variable. So, the ellipsoid
provides in an asymptotic -confidence region for .
Summary
-
The score function is the derivative
of the log-likelihood with
respect to . The covariance matrix of
is
the Fisher information matrix.
-
The score function has mean zero:
.
-
The Cramer-Rao bound says that any unbiased estimator
has a variance that is bounded
from below by the inverse of the Fisher information. Thus, an unbiased
estimator, which attains this lower bound, is a minimum variance estimator.
-
For i.i.d. data
the Fisher information matrix is:
.
-
MLE's attain the lower bound in an asymptotic sense, i.e.,
if
is the MLE for
,
i.e.,
.