6. Theory of Estimation

We know from our basic knowledge of statistics that one of the objectives in statistics is to better understand and model the underlying process which generates the data. This is known as statistical inference: we infer from information contained in a sample properties of the population from which the observations are taken. In multivariate statistical inference, we do exactly the same. The basic ideas were introduced in Section 4.5 on sampling theory: we observed the values of a multivariate random variable $X$ and obtained a sample ${\cal{X}}=\{x_i\}_{i=1}^n$. Under random sampling, these observations are considered to be realizations of a sequence of i.i.d. random variables $X_1,\ldots,X_n$ where each $X_i$ is a $p$-variate random variable which replicates the parent or population random variable $X$. In this chapter, for notational convenience, we will no longer differentiate between a random variable $X_i$ and an observation of it, $x_i$, in our notation. We will simply write $x_i$ and it should be clear from the context whether a random variable or an observed value is meant.

Statistical inference infers from the i.i.d. random sample ${\cal{X}}$ the properties of the population: typically, some unknown characteristic $\theta$ of its distribution. In parametric statistics, $\theta$ is a $k$-variate vector $\theta \in \mathbb{R}^k$ characterizing the unknown properties of the population pdf $f(x;\theta)$: this could be the mean, the covariance matrix, kurtosis, etc.

The aim will be to estimate $\theta$ from the sample ${\cal{X}}$ through estimators $\widehat \theta$ which are functions of the sample: $\widehat \theta
= \widehat \theta ({\cal{X}})$. When an estimator $\widehat \theta$ is proposed, we must derive its sampling distribution to analyze its properties (is it related to the unknown quantity $\theta$ it is supposed to estimate?).

In this chapter the basic theoretical tools are developed which are needed to derive estimators and to determine their properties in general situations. We will basically rely on the maximum likelihood theory in our presentation. In many situations, the maximum likelihood estimators indeed share asymptotic optimal properties which make their use easy and appealing.

We will illustrate the multivariate normal population and also the linear regression model where the applications are numerous and the derivations are easy to do. In multivariate setups, the maximum likelihood estimator is at times too complicated to be derived analytically. In such cases, the estimators are obtained using numerical methods (nonlinear optimization). The general theory and the asymptotic properties of these estimators remain simple and valid. The following chapter, Chapter 7, concentrates on hypothesis testing and confidence interval issues.