2.4 Properties of the Estimators

When we want to study the properties of the obtained estimators, it is convenient to distinguish between two categories of properties: i) the small (or finite) sample properties, which are valid whatever the sample size, and ii) the asymptotic properties, which are associated with large samples, i.e., when $ n$ tends to $ \infty$.

2.4.1 Finite Sample Properties of the OLS and ML Estimates of $ \beta $

Given that, as we obtained in the previous section, the OLS and ML estimates of $ \beta $ lead to the same result, the following properties refer to both. In order to derive these properties, and on the basis of the classical assumptions, the vector of estimated coefficients can be written in the following alternative form:

$\displaystyle \hat{\beta}=(X^{\top }X)^{-1}X^{\top }y=(X^{\top }X)^{-1}X^{\top }(X\beta+u)=\beta+(X^{\top }X)^{-1}X^{\top }u$ (2.54)

The variance-covariance matrix of $ \hat{\beta}$ has the following expression:

\begin{displaymath}\begin{array}{c} V(\hat{\beta})=\textrm{E}[(\hat{\beta}-\text...
...(X^{\top }X)^{-1}=\\ \\ \sigma^{2}(X^{\top }X)^{-1} \end{array}\end{displaymath} (2.56)

with the elements of this matrix meaning:

$\displaystyle var(\hat{\beta}_{j})=\sigma^{2}((X^{\top }X)^{-1})_{jj}$ (2.57)

$\displaystyle cov(\hat{\beta}_{j},\hat{\beta}_{h})=\sigma^{2}((X^{\top }X)^{-1})_{jh}$ (2.58)

Obviously, (2.56) is a symmetric positive definite matrix.

The consideration of $ V(\hat{\beta})$ allows us to define efficiency as a second finite sample property.

A property which is less strict than efficiency, is the so called best, linear unbiased estimator (BLUE) property, which also uses the variance of the estimators.

The set of results we have previously obtained, allows us to know the probability distribution for $ \hat{\beta}$ (or $ \tilde{\beta}$). Given that these estimator vectors are linear with respect to the $ y$ vector, and $ y$ having a normal distribution, then:

$\displaystyle \hat{\beta} \sim N(\beta,\sigma^{2}(X^{\top }X)^{-1})$ (2.74)

2.4.2 Finite Sample Properties of the OLS and ML Estimates of $ \sigma ^{2}$

According to expressions (2.34) and (2.53), the OLS and ML estimators of $ \sigma ^{2}$ are different, despite both being constructed through $ \hat{u}^{\top }\hat{u}$. In order to obtain their properties, it is convenient to express $ \hat{u}^{\top }\hat{u}$ as a function of the disturbance of the model. From the definition of $ \hat{u}$ in (2.26) we obtain:

$\displaystyle \hat{u}=y-X\hat{\beta}=y-X(X^{\top }X)^{-1}X^{\top }y=[I_{n}-X(X^{\top }X)^{-1}X^{\top }]y=My$ (2.75)

with $ M=I_{n}-X(X^{\top }X)^{-1}X^{\top }$ a non-stochastic square $ n$ matrix, which is symmetric, idempotent and whose rank and trace are $ n-k$. In addition, $ M$ fulfils $ MX=0$.

Result (2.75), which means that $ \hat{u}$ is linear with respect to $ y$, can be extended in the following way:

$\displaystyle \hat{u}=My=M(X\beta+u)=Mu$ (2.76)

that is to say, there is also a linear relation between $ \hat{u}$ and $ u$.

From (2.76), and under the earlier mentioned properties of $ M$, the sum of squared residuals can be written as a quadratic form of the disturbance vector,

$\displaystyle \hat{u}^{\top }\hat{u}= u^{\top }M^{\top }Mu=u^{\top }Mu$ (2.77)

Since every element of $ u$ has a N(0, $ \sigma ^{2}$) distribution, and $ M$ is an idempotent matrix, then $ \frac{u^{\top }Mu}{\sigma^{2}}$ follows a chi-squared distribution with degrees of freedom equal to the rank of $ M$, that is to say:

$\displaystyle \frac{u^{\top }Mu}{\sigma^{2}}\sim\chi^{2}_{n-k}$ (2.78)

Note that from (2.75), it is also possible to write $ \hat{u}^{\top }\hat{u}$ as a quadratic form of $ y$, yielding:

$\displaystyle \hat{u}^{\top }\hat{u}=y^{\top }My$ (2.79)

This expression for $ \hat{u}^{\top }\hat{u}$ allows us to obtain a very simple way to calculate the OLS or ML estimator of $ \sigma ^{2}$. For example, for $ \hat{\sigma}^{2}$:

$\displaystyle \hat{\sigma}^{2}=\frac{y^{\top }My}{n-k}=\frac{y^{\top }y-y^{\top...
... }X)^{-1}X^{\top }y}{n-k}= \frac{y^{\top }y-\hat{\beta}^{\top }X^{\top }y}{n-k}$ (2.80)

Having established these relations of interest, we now define the properties of $ \hat{\sigma}^{2}$ and $ \tilde{\sigma}^{2}$:

In order to analyze efficiency and BLUE properties, we must know the variance of $ \hat{\sigma}^{2}$ and $ \tilde{\sigma}^{2}$. From (2.78), we have $ var(\frac{u^{\top }Mu}{\sigma^{2}})=2(n-k)$, because the variance of a chi-square variable is two times its degrees of freedom. This result leads to the following expressions for the variances:

$\displaystyle var(\hat{\sigma}^{2})=\frac{1}{(n-k)^{2}}var(u^{\top }Mu)=\frac{1}{(n-k)^{2}}2(n-k)\sigma^{4}= \frac{2\sigma^{4}}{n-k}$ (2.88)

$\displaystyle var(\tilde{\sigma}^{2})=\frac{1}{n^{2}}var(u^{\top }Mu)=\frac{2\sigma^{4}(n-k)}{n^{2}}$ (2.89)

Nevertheless, given that $ \tilde{\sigma}^{2}$ is biased, this estimator can not be efficient, so we focus on the study of such a property for $ \hat{\sigma}^{2}$. With respect to the BLUE property, neither $ \hat{\sigma}^{2}$ nor $ \tilde{\sigma}^{2}$ are linear, so they can not be BLUE.

The variance-covariance matrix of an estimator vector could tell us how accurate it is. However, this matrix, which was obtained in (2.56), depends on the unknown $ \sigma ^{2}$ parameter, so we can obtain an unbiased estimation of it by substituting $ \sigma ^{2}$ for its unbiased estimator $ \hat{\sigma}^{2}$:

$\displaystyle \hat{V}(\hat{\beta})=\hat{V}(\tilde{\beta})=\hat{\sigma}^{2}(X^{\top }X)^{-1}$ (2.90)

The meaning of every element of this matrix is analogous to that presented in (2.57) and (2.58).

2.4.3 Asymptotic Properties of the OLS and ML Estimators of $ \beta $

Finite sample properties try to study the behavior of an estimator under the assumption of having many samples, and consequently many estimators of the parameter of interest. Thus, the average of these estimators should approach the parameter value (unbiasedness) or the average distance to the parameter value should be the smallest possible (efficiency). However, in practice we have only one sample, and the asymptotic properties are established by keeping this fact in mind but assuming that the sample is large enough.

Specifically, the asymptotic properties study the behavior of the estimators as $ n$ increases; in this sense, an estimator which is calculated for different sample sizes can be understood as a sequence of random variables indexed by the sample sizes (for example, $ z_{n}$). Two relevant aspects to analyze in this sequence are $ \textsl{convergence in probability}$ and $ \textsl{convergence in distribution}$.

A sequence of random variables $ z_{n}$ is said $ \textsl{to
converge in probability}$ to a constant $ c$ or to another random variable $ z$, if

$\displaystyle \lim_{n\rightarrow\infty}Pr[\vert z_{n}-c\vert<\epsilon]=1$ (2.91)


$\displaystyle \lim_{n\rightarrow\infty}Pr[\vert z_{n}-z\vert<\epsilon]=1$ (2.92)

where $ Pr$ denotes probability and $ \epsilon>0$ is an arbitrary constant. Equivalently, we can express this convergence as:

z_{n}\rightarrow_{p}c & and & z_{n}\rightarrow_{p}z


\begin{displaymath}\begin{array}{ccc} plimz_{n}=c & and & plimz_{n}=z \end{array}\end{displaymath} (2.93)

Result (2.91) implies that all the probability of the distribution becomes concentrated at points close to $ c$. Result (2.92) implies that the values that the variable may take that are not far from z become more probable as $ n$ increases, and moreover, this probability tends to one.

A second form of convergence is convergence in distribution. If $ z_{n}$ is a sequence of random variables with cumulative distribution function ($ cdf$) $ F_{n}(z)$, then the sequence $ \textsl{ converges in distribution}$ to a variable $ z$ with $ cdf$ $ F(z)$ if

$\displaystyle \lim_{n\rightarrow\infty}F_{n}(z)=F(z)$ (2.94)

which can be denoted by:

$\displaystyle z_{n}\rightarrow_{d}z$ (2.95)

and $ F(z)$ is said to be the $ \textsl{limit distribution}$ of $ z$.

Having established these preliminary concepts, we now consider the following desirable asymptotic properties : asymptotic unbiasedness, consistency and asymptotic efficiency.

In order to simplify notation, in what follows we will use $ \hat{\beta}$, instead of $ \hat{\beta_{n}}$. Nevertheless, we must continue considering it as a sequence of random variables indexed by the sample size. Consistency might be thought of as the minimum requirement for a useful estimator. However, given that there can be many consistent estimators of a parameter, it is convenient to consider another property such as asymptotic efficiency. This property focuses on the asymptotic variance of the estimators or asymptotic variance-covariance matrix of an estimator vector. Similar to asymptotic unbiasedness, two definitions of this concept can be found. The first of them defines it as the variance of the limit distribution of the estimator. Obviously, it is necessary to know this limit distribution. However, according to the meaning of consistency, the limit distribution of a consistent estimator is degenerated at a point, so its variance is zero. In order to obtain an approach to the limit distribution, we can use a $ \textsl{Central Limit Theorem}$ (CLT), which establishes the conditions to guaranty that the limit distribution is a normal distribution.

Suppose we have applied a CLT, and we have:

$\displaystyle \sqrt{n}(\hat{\hat{\theta}}-\theta)\rightarrow_{d}N(0,\gamma)$ (2.102)

with $ \gamma=V_{as}[\sqrt{n}(\hat{\hat{\theta}}-\theta)]$, that is to say, $ \gamma$ is the asymptotic variance of $ \sqrt{n}(\hat{\hat{\theta}}-\theta)$. This result allows us to approach the limit distribution of $ \hat{\hat{\theta}}$ as:

$\displaystyle \hat{\hat{\theta}}\rightarrow_{as}N(\theta,\frac{\gamma}{n})$ (2.103)

where $ \rightarrow_{as}$ denotes "asymptotically distributed as", and consequently the asymptotic variance of the estimator is approached by $ \frac{\gamma}{n}$.

The second definition of asymptotic variance, which does not require using any limit distribution, is obtained as:

$\displaystyle V_{as}(\hat{\hat{\theta}})=\frac{1}{n}\lim_{n\rightarrow\infty}\textrm{E}[\sqrt{n}(\hat{\hat{\theta}}-\textrm{E}(\hat{\hat{\theta}}))]^{2}$ (2.104)

In our framework, this second definition leads us to express the asymptotic variance of vector $ \hat{\beta}$ as:

$\displaystyle V_{as}(\hat{\beta})=\frac{1}{n}\lim_{n\rightarrow\infty}\textrm{E...
...trm{E}\hat{\beta}))((\hat{\beta}-\textrm{E}\hat{\beta})^{\top }\sqrt{n})]= \\

$\displaystyle \frac{1}{n}\lim_{n\rightarrow\infty}n\sigma^{2}(X^{\top }X)^{-1}=...
...}{n}\lim_{n\rightarrow\infty}\frac{n}{n}\sigma^{2}(\frac{X^{\top }X}{n})^{-1}=

$\displaystyle \frac{\sigma^{2}}{n}\lim_{n\rightarrow\infty}(\frac{X^{\top }X}{n})^{-1}=\frac{\sigma^{2}Q^{-1}}{n}$ (2.105)

If we consider the first approach of the asymptotic variance, the use of a CLT (see Judge, Carter, Griffiths, Lutkepohl and Lee (1988)) yields:

$\displaystyle \sqrt{n}(\hat{\beta}-\beta)\rightarrow_{d}N(0,\sigma^{2}Q^{-1})$ (2.106)

which leads to:

$\displaystyle \hat{\beta}\rightarrow_{as}N(\beta,\frac{\sigma^{2}Q^{-1}}{n})$ (2.107)

so $ V_{as}(\hat{\beta})$ is approached as $ \frac{\sigma^{2}Q^{-1}}{n}$.

2.4.4 Asymptotic Properties of the OLS and ML Estimators of $ \sigma ^{2}$

Finally, the study of the asymptotic efficiency property requires approaching the asymptotic variance-covariance of the estimators. Following Fomby, Carter, and Johnson (1984) we have,

$\displaystyle \sqrt{n}(\hat{\sigma}^{2}-\sigma^{2})\rightarrow_{d}N(0,2\sigma^{4})$ (2.114)

so the limit distribution of $ \hat{\sigma}^{2}$ can be approached as

$\displaystyle \hat{\sigma}^{2}\rightarrow_{as}N(\sigma^{2},\frac{2\sigma^{4}}{n})$ (2.115)

and then we conclude that

$\displaystyle var_{as}(\hat{\sigma}^{2})=\frac{2\sigma^{4}}{n}$ (2.116)

Analogously, following Dhrymes (1974), the ML estimator $ \tilde{\sigma}^{2}$ satisfies

$\displaystyle \sqrt{n}(\tilde{\sigma}^{2}-\sigma^{2})\rightarrow_{d}N(0,2\sigma^{4})$ (2.117)

so $ var_{as}(\tilde{\sigma}^{2})$ has the same form as that given in (2.116).

The second way to approach the asymptotic variance (see (2.104) ), leads to the following expressions:

$\displaystyle var_{as}(\hat{\sigma}^{2})=\frac{1}{n}\lim_{n\rightarrow\infty}\t...

$\displaystyle \frac{1}{n}\lim_{n\rightarrow\infty}\frac{2\sigma^{4}}{\frac{n-k}...
...a^{4}}{\lim_{n\rightarrow\infty}(1-\frac{k}{n})}\right] =\frac{1}{n}2\sigma^{4}$ (2.118)

$\displaystyle var_{as}(\tilde{\sigma}^{2})= \frac{1}{n}\lim_{n\rightarrow\infty...
...ma^{4}-\lim_{n\rightarrow\infty}\frac{2\sigma^{4}k}{n}]= \frac{1}{n}2\sigma^{4}$ (2.119)

2.4.5 Example

As we have seen in the previous section, the quantlet 8785 gls allows us to estimate all the parameters of the MLRM. In addition, if we want to estimate the variance-covariance matrix of $ \hat{\beta}$, which is given by $ \hat{\sigma}^{2}(X^{\top }X)^{-1}$, we can use the following quantlet

8789 XEGmlrm03.xpl