Suppose that
is an i.i.d. sample from a population
with pdf
. The aim is to estimate
which
is a vector of unknown parameters.
The likelihood function is defined as the
joint density
of the observations
considered as a function
of
:
![\begin{displaymath}
L(\data{X};\theta )=\prod ^n_{i=1}f(x_i;\theta ),
\end{displaymath}](mvahtmlimg1871.gif) |
(6.1) |
where
denotes the sample of the data matrix with the observations
in each row.
The maximum likelihood estimator (MLE) of
is defined as
Often it is easier to maximize the log-likelihood
function
![\begin{displaymath}
\ell (\data{X};\theta )=\log L(\data{X};\theta),
\end{displaymath}](mvahtmlimg1874.gif) |
(6.2) |
which is equivalent since the logarithm is a monotone one-to-one
function. Hence
The following examples illustrate cases where the maximization process can be
performed analytically, i.e., we will obtain an explicit analytical
expression for
. Unfortunately, in other situations, the
maximization process can be more intricate, involving nonlinear optimization
techniques. In the latter case, given a sample
and the likelihood
function, numerical methods will be used to determine the value of
maximizing
or
. These numerical methods
are typically based on Newton-Raphson iterative techniques.
EXAMPLE 6.1
Consider a sample
![$ \{x_{i}\}_{i=1}^n $](mvahtmlimg1058.gif)
from
![$N_{p}(\mu,\data{I})$](mvahtmlimg1878.gif)
,
i.e., from the pdf
where
![$\theta =\mu\in\mathbb{R}^p$](mvahtmlimg1880.gif)
is the mean vector parameter. The log-likelihood
is in this case
![\begin{displaymath}
\ell (\data{X};\theta )= \sum_{i=1}^n \log\{f(x_{i};\theta )...
...c{ 1}{2} \sum_{i=1}^n
(x_{i}-\theta )^{\top} (x_{i}-\theta).
\end{displaymath}](mvahtmlimg1881.gif) |
(6.3) |
The term
![$(x_i-\theta )^{\top}(x_i-\theta )$](mvahtmlimg1882.gif)
equals
Summing this term over
![$i=1,\ldots ,n$](mvahtmlimg1884.gif)
we see that
Hence
Only the last term depends on
![$\theta$](mvahtmlimg581.gif)
and is obviously maximized for
Thus
![$\overline x$](mvahtmlimg1020.gif)
is the MLE of
![$\theta$](mvahtmlimg581.gif)
for this family of
pdfs
![$f(x,\theta)$](mvahtmlimg1888.gif)
.
A more complex example is the following one where we derive
the MLE's for
and
.
EXAMPLE 6.2
Suppose
![$\{x_i\}^n_{i=1}$](mvahtmlimg29.gif)
is a sample from a normal
distribution
![$N_p(\mu,\Sigma)$](mvahtmlimg1282.gif)
.
Here
![$\theta =(\mu ,\Sigma )$](mvahtmlimg1889.gif)
with
![$\Sigma$](mvahtmlimg869.gif)
interpreted as a vector.
Due to the symmetry of
![$\Sigma$](mvahtmlimg869.gif)
the unknown parameter
![$\theta$](mvahtmlimg581.gif)
is in fact
![$\{p+\frac{1}{2}p(p+1)\}$](mvahtmlimg1890.gif)
-dimensional. Then
![\begin{displaymath}
L(\data{X};\theta ) = \vert 2\pi \Sigma \vert^{-n/2}
\exp\le...
...}\sum ^n_{i=1}
(x_i-\mu)^{\top}\Sigma ^{-1}(x_i-\mu )\right\}
\end{displaymath}](mvahtmlimg1891.gif) |
(6.4) |
and
![\begin{displaymath}
\ell (\data{X};\theta )=-\frac{n }{ 2}\log\vert 2\pi \Sigma ...
...{1 }{2 }\sum
^n_{i=1}(x_i-\mu )^{\top}\Sigma ^{-1}(x_i-\mu ).
\end{displaymath}](mvahtmlimg1892.gif) |
(6.5) |
The term
equals
Summing this term over
![$i=1,\ldots ,n$](mvahtmlimg1884.gif)
we see that
Note that from (
2.14)
Therefore, by summing over the index
![$i$](mvahtmlimg150.gif)
we finally arrive at
Thus the log-likelihood function for
![$N_p(\mu,\Sigma)$](mvahtmlimg1282.gif)
is
![\begin{displaymath}
\ell (\data{X};\theta )=-\frac{n }{2}\log \vert 2\pi \Sigma ...
... }{2}(\overline x-\mu )^{\top}\Sigma ^{-1}
(\overline x-\mu).
\end{displaymath}](mvahtmlimg1898.gif) |
(6.6) |
We can easily see that the third term is maximized by
![$\mu =
\bar{x}$](mvahtmlimg1899.gif)
. In fact the MLE's are given by
The derivation of
![$\widehat\Sigma$](mvahtmlimg1901.gif)
is a lot more complicated.
It involves derivatives with respect to matrices with their
notational complexities and will not be presented here:
for a more elaborate proof see
Mardia et al. (1979, p.103-104).
Note that the unbiased covariance estimator
![$\data{S}_{u}=\frac{n}{n-1} \data{S}$](mvahtmlimg1765.gif)
is not the MLE of
![$\Sigma$](mvahtmlimg869.gif)
!
EXAMPLE 6.3
Consider the linear regression model
![$y_{i} = \beta^{\top}x_{i} +
\varepsilon_{i}$](mvahtmlimg1902.gif)
for
![$i=1,\ldots ,n$](mvahtmlimg1884.gif)
, where
![$\varepsilon_{i}$](mvahtmlimg822.gif)
is i.i.d.
and
![$N(0,\sigma^2)$](mvahtmlimg231.gif)
and where
![$x_{i} \in \mathbb{R}^p$](mvahtmlimg1903.gif)
. Here
![$\theta = (\beta^{\top},
\sigma)$](mvahtmlimg1904.gif)
is a
![$(p+1)$](mvahtmlimg989.gif)
-dimensional parameter vector. Denote
Then
and
Differentiating w.r.t. the parameters yields
Note that
![$ \frac{\partial}{\partial\beta} \ell $](mvahtmlimg1909.gif)
denotes the vector of
the derivatives w.r.t. all components of
![$\beta$](mvahtmlimg821.gif)
(the gradient).
Since the first equation only depends on
![$\beta$](mvahtmlimg821.gif)
, we start with
deriving
![$\widehat{\beta}$](mvahtmlimg945.gif)
.
Plugging
![$\widehat{\beta}$](mvahtmlimg945.gif)
into the second equation gives
where
![$\vert\vert\bullet\vert\vert^2$](mvahtmlimg1912.gif)
denotes the Euclidean vector norm from Section
2.6. We see that the MLE
![$\widehat{\beta}$](mvahtmlimg945.gif)
is
identical with the least squares estimator (
3.52). The variance
estimator
is nothing else than the residual sum of squares (RSS) from
(
3.37) generalized to the case of multivariate
![$x_{i}$](mvahtmlimg1066.gif)
.
Note that when the
are considered
to be fixed we have
Then, using the properties of moments from Section
4.2 we have
![\begin{displaymath}
E(\widehat{\beta})=(\data{X}^{\top}\data{X})^{-1}\data{X}^{\top}E(y)=\beta,
\end{displaymath}](mvahtmlimg1915.gif) |
(6.7) |
![\begin{displaymath}
\Var(\widehat{\beta})=\sigma^2(\data{X}^{\top}\data{X})^{-1}.
\end{displaymath}](mvahtmlimg1916.gif) |
(6.8) |
Summary
![$\ast$](mvahtmlimg108.gif)
-
If
is an i.i.d. sample from a distribution with
pdf
, then
is the likelihood function.
The maximum likelihood estimator (MLE) is that value of
which maximizes
. Equivalently one can maximize the
log-likelihood
.
![$\ast$](mvahtmlimg108.gif)
-
The MLE's of
and
from a
distribution are
and
. Note that the MLE of
is not
unbiased.
![$\ast$](mvahtmlimg108.gif)
-
The MLE's of
and
in the linear model
are given by the least squares estimator
and
.
and
.