2.4 Properties of the Estimators

When we want to study the properties of the obtained estimators, it is convenient to distinguish between two categories of properties: i) the small (or finite) sample properties, which are valid whatever the sample size, and ii) the asymptotic properties, which are associated with large samples, i.e., when tends to $\infty$ .

2.4.1 Finite Sample Properties of the OLS and ML Estimates of $\beta$

Given that, as we obtained in the previous section, the OLS and ML estimates of $\beta$ lead to the same result, the following properties refer to both. In order to derive these properties, and on the basis of the classical assumptions, the vector of estimated coefficients can be written in the following alternative form:

$\displaystyle \hat{\beta}=(X^{\top }X)^{-1}X^{\top }y=(X^{\top }X)^{-1}X^{\top }(X\beta+u)=\beta+(X^{\top }X)^{-1}X^{\top }u$

(2.54)

Unbiasedness. According to the concept of unbiasedness, vector $\hat{\beta}$ is an unbiased estimator vector of $\beta$ since:

$\displaystyle \textrm{E}(\hat{\beta})=\textrm{E}[\beta+(X^{\top }X)^{-1}X^{\top }u]=\beta+(X^{\top }X)^{-1}X^{\top }\textrm{E}(u)=\beta$ (2.55)

The unbiasedness property of the estimators means that, if we have many samples for the random variable and we calculate the estimated value corresponding to each sample, the average of these estimated values approaches the unknown parameter. Nevertheless, we usually have only one sample (i.e, one realization of the random variable), so we can not assure anything about the distance between $\hat{\beta}$ and $\beta$ . This fact leads us to employ the concept of variance, or the variance-covariance matrix if we have a vector of estimates. This concept measures the average distance between the estimated value obtained from the only sample we have and its expected value.
From the previous argument we can deduce that, although the unbiasedness property is not sufficient in itself, it is the minimum requirement to be satisfied by an estimator.

The variance-covariance matrix of $\hat{\beta}$ has the following expression:

$\begin{displaymath}\begin{array}{c} V(\hat{\beta})=\textrm{E}[(\hat{\beta}-\text... ...(X^{\top }X)^{-1}=\\ \\ \sigma^{2}(X^{\top }X)^{-1} \end{array}\end{displaymath}$

(2.56)

with the elements of this matrix meaning:

$\displaystyle var(\hat{\beta}_{j})=\sigma^{2}((X^{\top }X)^{-1})_{jj}$

(2.57)

$\displaystyle cov(\hat{\beta}_{j},\hat{\beta}_{h})=\sigma^{2}((X^{\top }X)^{-1})_{jh}$

(2.58)

Obviously, (2.56) is a symmetric positive definite matrix.

The consideration of $V(\hat{\beta})$ allows us to define efficiency as a second finite sample property.

Efficiency. An estimator is efficient if it is the minimum variance unbiased estimator. The Cramer Rao inequality provides verification of efficiency, since it establishes the lower bound for the variance-covariance matrix of any unbiased estimator. This lower bound is given by the corresponding element of the diagonal of the inverse of the information matrix (or sample information matrix) $I_{n}(\theta)$ , which is defined as:

$\displaystyle I_{n}(\theta)=-\textrm{E}[H(\theta)]$ (2.59)

where denotes the hessian matrix, i.e., the matrix of the second partial derivatives of the log-likelihood function .
In order to study the efficiency property for the OLS and ML estimates of $\beta$ , we begin by defining $\theta^{\top }=(\beta^{\top },\sigma^{2})$ , and the hessian matrix is expressed as a partitioned matrix of the form:

$\displaystyle \frac{\partial^{2}\ell}{\partial\theta\partial\theta^{\top }}= \b... ...ial(\sigma^{2})^{2}} \end{pmatrix} = \begin{pmatrix}A& B \\ C & D \end{pmatrix}$ (2.60)

where is a square matrix, and are $k\times1$ vectors, and is a $1\times1$ element.
From (2.50) and (2.51), we have:

$\displaystyle \frac{\partial^{2}\ell}{\partial\theta\partial\theta^{\top }}= \b... ...} & \frac{n\sigma^{2}-2(y-X\beta)^{\top }(y-X\beta)}{2\sigma^{6}} \end{pmatrix}$

$\displaystyle = \begin{pmatrix}-\frac{X^{\top }X}{\sigma^{2}} & -\frac{X^{\top ... ...\top }X}{\sigma^{4}}& \frac{n\sigma^{2}-2u^{\top }u}{2\sigma^{6}} \end{pmatrix}$ (2.61)

Thus, the sample information matrix is:

$\displaystyle I_{n}(\theta)= \begin{pmatrix}\frac{X^{\top }X}{\sigma^{2}} & 0 \\ 0 & \frac{n}{2\sigma^{4}} \end{pmatrix}$ (2.62)

and its inverse,

$\displaystyle [I_{n}(\theta)]^{-1}= \begin{pmatrix}\sigma^{2}(X^{\top }X)^{-1} ... ...ma^{4}}{n} \end{pmatrix} = \begin{pmatrix}I^{11}& 0 \\ 0 & I^{22} \end{pmatrix}$ (2.63)

Following the Cramer-Rao inequality, $I^{11}$ constitutes the lower bound for the variance-covariance matrix of any unbiased estimator vector of the parameter vector $\beta$ , while $I^{22}$ is the corresponding bound for the variance of an unbiased estimator of $\sigma ^{2}$ .
According to (2.56), we can conclude that $\hat{\beta}$ (or $\tilde{\beta}$ ), satisfies the efficiency property, given that their variance-covariance matrix coincides with $I^{11}$ .

A property which is less strict than efficiency, is the so called best, linear unbiased estimator (BLUE) property, which also uses the variance of the estimators.

BLUE. A vector of estimators is BLUE if it is the minimum variance linear unbiased estimator. To show this property, we use the Gauss-Markov Theorem. In the MLRM framework, this theorem provides a general expression for the variance-covariance matrix of a linear unbiased vector of estimators. Then, the comparison of this matrix with the corresponding matrix of $\hat{\beta}$ allows us to conclude that $\hat{\beta}$ (or $\tilde{\beta}$ ) is BLUE.
With this aim, we define $\hat{\hat{\beta}}$ as a family of linear vectors of estimates of the parameter vector $\beta$ :

$\displaystyle \hat{\hat{\beta}}=C^{\top }y=C^{\top }X\beta+C^{\top }u$ (2.64)

with being a matrix of constant elements, where:

$\displaystyle C^{\top }=A^{\top }+D^{\top }$ (2.65)

In order to assure the unbiasedness of $\hat{\hat{\beta}}$ , we suppose $C^{\top }X=I_{k}$ , and then (2.64) can be written as:

$\displaystyle \hat{\hat{\beta}}=\beta+C^{\top }u$ (2.66)

From this last expression we can derive the variance-covariance matrix of $\hat{\hat{\beta}}$ :

$\displaystyle V(\hat{\hat{\beta}})=\textrm{E}[(\hat{\hat{\beta}}-\textrm{E} \hat{\hat{\beta}})(\hat{\hat{\beta}}-\textrm{E} \hat{\hat{\beta}})^{\top }]$

$\displaystyle =\textrm{E}[(\hat{\hat{\beta}}-\beta)(\hat{\hat{\beta}}-\beta)^{\top }]= \textrm{E}[C^{\top }uu^{\top }C]=\sigma^{2}C^{\top }C$ (2.67)

Taking into account (2.65) we have:

$\displaystyle C^{\top }C=(A^{\top }+D^{\top })(A+D)=A^{\top }A+A^{\top }D+D^{\top }A+D^{\top }D$ (2.68)

and the unbiasedness condition $C^{\top }X=I_{k}$ allows us to show that $D^{\top }A=A^{\top }D=0$ :

$\displaystyle C^{\top }X=I_{k}\Longrightarrow(A^{\top }+D^{\top })X=I_{k}\Longrightarrow A^{\top }X+D^{\top }X=I_{k}$ (2.69)

and given that $A^{\top }=(X^{\top }X)^{-1}X^{\top }$ , as was established in (2.28), we derive that $A^{\top }X=I_{k}$ . By substituting this result into the last term of (2.69), it must hold that $D^{\top }X=0$ , which implies that:

$\displaystyle D^{\top }A=D^{\top }X(X^{\top }X)^{-1}=0$
and obviously, $A^{\top }D=0$ . We now take expression (2.67), which we can write as:

$\displaystyle V(\hat{\hat{\beta}})=\sigma^{2}(A^{\top }A+D^{\top }D)$ (2.70)

and given that $A^{\top }A=(X^{\top }X)^{-1}$ , according to (2.56), we have:

$\displaystyle V(\hat{\hat{\beta}})=V(\hat{\beta})+\sigma^{2}D^{\top }D$ (2.71)

or

$\displaystyle V(\hat{\hat{\beta}})-V(\hat{\beta})=\sigma^{2}D^{\top }D$ (2.72)

A general result matrix establishes that given any matrix P, then $P^{\top }P$ is a positive semidefinite matrix, so we can conclude that $D^{\top }D$ is positive semidefinite. This property means that the elements of its diagonal are non negative, so we deduce for every $\beta_{j}$ coefficient:

$\begin{displaymath}\begin{array}{cc} var(\hat{\hat{\beta}}_{j}) \geq var(\hat{\beta_{j}}) & j=1,\ldots,k. \end{array}\end{displaymath}$ (2.73)

that is to say, we conclude that the OLS or ML estimator vector of $\beta$ satisfies the Gauss-Markov theorem, and this implies that $\hat{\beta}$ (or $\tilde{\beta}$ ) is BLUE.

The set of results we have previously obtained, allows us to know the probability distribution for $\hat{\beta}$ (or $\tilde{\beta}$ ). Given that these estimator vectors are linear with respect to the vector, and having a normal distribution, then:

$\displaystyle \hat{\beta} \sim N(\beta,\sigma^{2}(X^{\top }X)^{-1})$

(2.74)

2.4.2 Finite Sample Properties of the OLS and ML Estimates of $\sigma ^{2}$

According to expressions (2.34) and (2.53), the OLS and ML estimators of $\sigma ^{2}$ are different, despite both being constructed through $\hat{u}^{\top }\hat{u}$ . In order to obtain their properties, it is convenient to express $\hat{u}^{\top }\hat{u}$ as a function of the disturbance of the model. From the definition of $\hat{u}$ in (2.26) we obtain:

$\displaystyle \hat{u}=y-X\hat{\beta}=y-X(X^{\top }X)^{-1}X^{\top }y=[I_{n}-X(X^{\top }X)^{-1}X^{\top }]y=My$

(2.75)

with $M=I_{n}-X(X^{\top }X)^{-1}X^{\top }$ a non-stochastic square

matrix, which is symmetric, idempotent and whose rank and trace are

. In addition,

fulfils

Result (2.75), which means that $\hat{u}$ is linear with respect to , can be extended in the following way:

$\displaystyle \hat{u}=My=M(X\beta+u)=Mu$

(2.76)

that is to say, there is also a linear relation between $\hat{u}$ and

From (2.76), and under the earlier mentioned properties of , the sum of squared residuals can be written as a quadratic form of the disturbance vector,

$\displaystyle \hat{u}^{\top }\hat{u}= u^{\top }M^{\top }Mu=u^{\top }Mu$

(2.77)

Since every element of

has a N(0, $\sigma ^{2}$ ) distribution, and

is an idempotent matrix, then $\frac{u^{\top }Mu}{\sigma^{2}}$ follows a chi-squared distribution with degrees of freedom equal to the rank of

, that is to say:

$\displaystyle \frac{u^{\top }Mu}{\sigma^{2}}\sim\chi^{2}_{n-k}$

(2.78)

Note that from (2.75), it is also possible to write $\hat{u}^{\top }\hat{u}$ as a quadratic form of , yielding:

$\displaystyle \hat{u}^{\top }\hat{u}=y^{\top }My$

(2.79)

This expression for $\hat{u}^{\top }\hat{u}$ allows us to obtain a very simple way to calculate the OLS or ML estimator of $\sigma ^{2}$ . For example, for $\hat{\sigma}^{2}$ :

$\displaystyle \hat{\sigma}^{2}=\frac{y^{\top }My}{n-k}=\frac{y^{\top }y-y^{\top... ... }X)^{-1}X^{\top }y}{n-k}= \frac{y^{\top }y-\hat{\beta}^{\top }X^{\top }y}{n-k}$

(2.80)

Having established these relations of interest, we now define the properties of $\hat{\sigma}^{2}$ and $\tilde{\sigma}^{2}$ :

Linearity. According to (2.79) the OLS and ML estimators of $\sigma ^{2}$ are expressed as:

$\displaystyle \hat{\sigma}^{2}=\frac{\hat{u}^{\top }\hat{u}}{n-k}=\frac{y^{\top }My}{n-k}$
and

$\displaystyle \tilde{\sigma}^{2}=\frac{\tilde{u}^{\top }\tilde{u}}{n}=\frac{y^{\top }My}{n}$
so both are non linear with respect to , given that their numerators are quadratic forms of .
Unbiasedness. In order to show this property, we use (2.77), to obtain:

$\displaystyle \hat{\sigma}^{2}=\frac{\hat{u}^{\top }\hat{u}}{n-k}=\frac{u^{\top }Mu}{n-k}$ (2.81)

$\displaystyle \tilde{\sigma}^{2}=\frac{\tilde{u}^{\top }\tilde{u}}{n}=\frac{u^{\top }Mu}{n}$ (2.82)

If we first consider $\hat{\sigma}^{2}$ , we must calculate:

$\displaystyle \textrm{E}(\hat{\sigma}^{2})=\frac{1}{n-k}\textrm{E}(u^{\top }Mu)$ (2.83)

The calculation of $\textrm{E}(u^{\top }Mu)$ requires using the distribution (2.78), in such a way that, given that a chi-square variable has expected value equal to the corresponding degree of freedom, we have:

$\displaystyle \textrm{E}(\frac{u^{\top }Mu}{\sigma^{2}})=\frac{1}{\sigma^{2}}\textrm{E}(u^{\top }Mu)=n-k$ (2.84)

and then,

$\displaystyle \textrm{E}(u^{\top }Mu)=\sigma^{2}(n-k)$ (2.85)

which allows us to obtain:

$\displaystyle \textrm{E}(\hat{\sigma}^{2})=\frac{1}{n-k}\textrm{E}(u^{\top }Mu)=\frac{1}{n-k}\sigma^{2}(n-k)=\sigma^{2}$ (2.86)

In a similar way, we obtain $\textrm{E}(\tilde{\sigma}^{2})$ :

$\displaystyle \textrm{E}(\tilde{\sigma}^{2})=\frac{1}{n}\textrm{E}(u^{\top }Mu)=\frac{1}{n}\sigma^{2}(n-k)=\sigma^{2}\frac{n-k}{n}$ (2.87)

so we conclude that $\hat{\sigma}^{2}$ is an unbiased estimator for $\sigma ^{2}$ , while $\tilde{\sigma}^{2}$ is biased.

In order to analyze efficiency and BLUE properties, we must know the variance of $\hat{\sigma}^{2}$ and $\tilde{\sigma}^{2}$ . From (2.78), we have $var(\frac{u^{\top }Mu}{\sigma^{2}})=2(n-k)$ , because the variance of a chi-square variable is two times its degrees of freedom. This result leads to the following expressions for the variances:

$\displaystyle var(\hat{\sigma}^{2})=\frac{1}{(n-k)^{2}}var(u^{\top }Mu)=\frac{1}{(n-k)^{2}}2(n-k)\sigma^{4}= \frac{2\sigma^{4}}{n-k}$

(2.88)

$\displaystyle var(\tilde{\sigma}^{2})=\frac{1}{n^{2}}var(u^{\top }Mu)=\frac{2\sigma^{4}(n-k)}{n^{2}}$

(2.89)

Nevertheless, given that $\tilde{\sigma}^{2}$ is biased, this estimator can not be efficient, so we focus on the study of such a property for $\hat{\sigma}^{2}$ . With respect to the BLUE property, neither $\hat{\sigma}^{2}$ nor $\tilde{\sigma}^{2}$ are linear, so they can not be BLUE.

Efficiency. The comparison of the variance of $\hat{\sigma}^{2}$ (expression (2.88)) with element $I^{22}$ of the matrix $(I_{n}(\theta))^{-1}$ (expression (2.63)) allows us to deduce that this estimator does not satisfy the Cramer-Rao inequality, given that $I^{22} \neq var(\hat{\sigma}^{2})$ . Nevertheless, as Schmidt (1976) shows, there is no unbiased estimator of $\sigma ^{2}$ with a smaller variance, so it can be said that $\hat{\sigma}^{2}$ is an efficient estimator.

The variance-covariance matrix of an estimator vector could tell us how accurate it is. However, this matrix, which was obtained in (2.56), depends on the unknown $\sigma ^{2}$ parameter, so we can obtain an unbiased estimation of it by substituting $\sigma ^{2}$ for its unbiased estimator $\hat{\sigma}^{2}$ :

$\displaystyle \hat{V}(\hat{\beta})=\hat{V}(\tilde{\beta})=\hat{\sigma}^{2}(X^{\top }X)^{-1}$

(2.90)

The meaning of every element of this matrix is analogous to that presented in (2.57) and (2.58).

2.4.3 Asymptotic Properties of the OLS and ML Estimators of $\beta$

Finite sample properties try to study the behavior of an estimator under the assumption of having many samples, and consequently many estimators of the parameter of interest. Thus, the average of these estimators should approach the parameter value (unbiasedness) or the average distance to the parameter value should be the smallest possible (efficiency). However, in practice we have only one sample, and the asymptotic properties are established by keeping this fact in mind but assuming that the sample is large enough.

Specifically, the asymptotic properties study the behavior of the estimators as increases; in this sense, an estimator which is calculated for different sample sizes can be understood as a sequence of random variables indexed by the sample sizes (for example, $z_{n}$ ). Two relevant aspects to analyze in this sequence are $\textsl{convergence in probability}$ and $\textsl{convergence in distribution}$ .

A sequence of random variables $z_{n}$ is said $\textsl{to converge in probability}$ to a constant or to another random variable , if

$\displaystyle \lim_{n\rightarrow\infty}Pr[\vert z_{n}-c\vert<\epsilon]=1$

(2.91)

$\displaystyle \lim_{n\rightarrow\infty}Pr[\vert z_{n}-z\vert<\epsilon]=1$

(2.92)

where

denotes probability and $\epsilon>0$ is an arbitrary constant. Equivalently, we can express this convergence as:

$\begin{displaymath} \begin{array}{ccc} z_{n}\rightarrow_{p}c & and & z_{n}\rightarrow_{p}z \end{array}\end{displaymath}$

$\begin{displaymath}\begin{array}{ccc} plimz_{n}=c & and & plimz_{n}=z \end{array}\end{displaymath}$

(2.93)

Result (2.91) implies that all the probability of the distribution becomes concentrated at points close to . Result (2.92) implies that the values that the variable may take that are not far from z become more probable as increases, and moreover, this probability tends to one.

A second form of convergence is convergence in distribution. If $z_{n}$ is a sequence of random variables with cumulative distribution function () $F_{n}(z)$ , then the sequence $\textsl{ converges in distribution}$ to a variable with if

$\displaystyle \lim_{n\rightarrow\infty}F_{n}(z)=F(z)$

(2.94)

which can be denoted by:

$\displaystyle z_{n}\rightarrow_{d}z$

(2.95)

and

is said to be the $\textsl{limit distribution}$ of

Having established these preliminary concepts, we now consider the following desirable asymptotic properties : asymptotic unbiasedness, consistency and asymptotic efficiency.

Asymptotic unbiasedness. There are two alternative definitions of this concept. The first states that an estimator $\hat{\hat{\theta}}$ is asymptotically unbiased if as n increases, the sequence of its first moments converges to the parameter $\theta$ . It can be expressed as:

$\displaystyle \lim_{n\rightarrow\infty}\textrm{E}(\hat{\hat{\theta}}_{n})=\theta\Rightarrow\lim_{n\rightarrow\infty} \textrm{E}(\hat{\hat{\theta}}_{n})-\theta=0$ (2.96)

Note that the second part of (2.96) also means that the possible bias of $\hat{\hat{\theta}}$ disappears as increases, so we can deduce that an unbiased estimator is also an asymptotic unbiased estimator.
The second definition is based on the convergence in distribution of a sequence of random variables. According to this definition, an estimator $\hat{\hat{\theta}}$ is asymptotically unbiased if its asymptotic expectation, or expectation of its limit distribution, is the parameter $\theta$ . It is expressed as follows:

$\displaystyle \textrm{E}_{as}(\hat{\hat{\theta}})=\theta$ (2.97)

Since this second definition requires knowing the limit distribution of the sequence of random variables, and this is not always easy to know, the first definition is very often used.
In our case, since $\hat{\beta}$ and $\tilde{\beta}$ are unbiased, it follows that they are asymptotically unbiased:

$\displaystyle \lim_{n\rightarrow\infty}\textrm{E}(\hat{\beta}_{n})=\beta \Rightarrow\lim_{n\rightarrow\infty}\textrm{E}(\hat{\beta}_{n})-\beta=0$ (2.98)

In order to simplify notation, in what follows we will use $\hat{\beta}$ , instead of $\hat{\beta_{n}}$ . Nevertheless, we must continue considering it as a sequence of random variables indexed by the sample size.

Consistency. An estimator $\hat{\hat{\theta}}$ is said to be consistent if it converges in probability to the unknown parameter, that is to say:

$\displaystyle plim\hat{\hat{\theta}}_{n}=\theta$ (2.99)

which, in view of (2.91), means that a consistent estimator satisfies the convergence in probability to a constant, with the unknown parameter $\theta$ being such a constant.
The simplest way of showing consistency consists of proving two sufficient conditions: i) the estimator must be asymptotically unbiased, and ii) its variance must converge to zero as n increases. These conditions are derived from the convergence in quadratic mean (or convergence in second moments), given that this concept of convergence implies convergence in probability (for a detailed study of the several modes of convergence and their relations, see Amemiya (1985), Spanos (1986) and White (1984)).
In our case, since the asymptotic unbiasedness of $\hat{\beta}$ and $\tilde{\beta}$ has been shown earlier, we only have to prove the second condition. In this sense, we calculate:

$\displaystyle \lim_{n\rightarrow\infty}V(\hat{\beta})=\lim_{n\rightarrow\infty}\sigma^{2}(X^{\top }X)^{-1}$ (2.100)

Multiplying and dividing (2.100) by , we obtain:

$\displaystyle \lim_{n\rightarrow\infty}V(\hat{\beta})=\lim_{n\rightarrow\infty}... ...-1}= \lim_{n\rightarrow\infty}\frac{\sigma^{2}}{n}(\frac{X^{\top }X}{n})^{-1}=$

$\displaystyle \lim_{n\rightarrow\infty}\frac{\sigma^{2}}{n}\lim_{n\rightarrow\infty}(\frac{X^{\top }X}{n})^{-1}= 0 \times Q^{-1}=0$ (2.101)

where we have used the condition (2.6) included in assumption 1. Thus, result (2.101) proves the consistency of the OLS and ML estimators of the coefficient vector. As we mentioned before, this means that all the probability of the distribution of $\hat{\beta}$ (or $\beta$ ) becomes concentrated at points close to $\beta$ , as increases.

Consistency might be thought of as the minimum requirement for a useful estimator. However, given that there can be many consistent estimators of a parameter, it is convenient to consider another property such as asymptotic efficiency. This property focuses on the asymptotic variance of the estimators or asymptotic variance-covariance matrix of an estimator vector. Similar to asymptotic unbiasedness, two definitions of this concept can be found. The first of them defines it as the variance of the limit distribution of the estimator. Obviously, it is necessary to know this limit distribution. However, according to the meaning of consistency, the limit distribution of a consistent estimator is degenerated at a point, so its variance is zero. In order to obtain an approach to the limit distribution, we can use a $\textsl{Central Limit Theorem}$ (CLT), which establishes the conditions to guaranty that the limit distribution is a normal distribution.

Suppose we have applied a CLT, and we have:

$\displaystyle \sqrt{n}(\hat{\hat{\theta}}-\theta)\rightarrow_{d}N(0,\gamma)$

(2.102)

with $\gamma=V_{as}[\sqrt{n}(\hat{\hat{\theta}}-\theta)]$ , that is to say, $\gamma$ is the asymptotic variance of $\sqrt{n}(\hat{\hat{\theta}}-\theta)$ . This result allows us to approach the limit distribution of $\hat{\hat{\theta}}$ as:

$\displaystyle \hat{\hat{\theta}}\rightarrow_{as}N(\theta,\frac{\gamma}{n})$

(2.103)

where $\rightarrow_{as}$ denotes "asymptotically distributed as", and consequently the asymptotic variance of the estimator is approached by $\frac{\gamma}{n}$ .

The second definition of asymptotic variance, which does not require using any limit distribution, is obtained as:

$\displaystyle V_{as}(\hat{\hat{\theta}})=\frac{1}{n}\lim_{n\rightarrow\infty}\textrm{E}[\sqrt{n}(\hat{\hat{\theta}}-\textrm{E}(\hat{\hat{\theta}}))]^{2}$

(2.104)

In our framework, this second definition leads us to express the asymptotic variance of vector $\hat{\beta}$ as:

$\displaystyle V_{as}(\hat{\beta})=\frac{1}{n}\lim_{n\rightarrow\infty}\textrm{E... ...trm{E}\hat{\beta}))((\hat{\beta}-\textrm{E}\hat{\beta})^{\top }\sqrt{n})]= \\$

$\displaystyle \frac{1}{n}\lim_{n\rightarrow\infty}n\sigma^{2}(X^{\top }X)^{-1}=... ...}{n}\lim_{n\rightarrow\infty}\frac{n}{n}\sigma^{2}(\frac{X^{\top }X}{n})^{-1}=$

$\displaystyle \frac{\sigma^{2}}{n}\lim_{n\rightarrow\infty}(\frac{X^{\top }X}{n})^{-1}=\frac{\sigma^{2}Q^{-1}}{n}$

(2.105)

If we consider the first approach of the asymptotic variance, the use of a CLT (see Judge, Carter, Griffiths, Lutkepohl and Lee (1988)) yields:

$\displaystyle \sqrt{n}(\hat{\beta}-\beta)\rightarrow_{d}N(0,\sigma^{2}Q^{-1})$

(2.106)

which leads to:

$\displaystyle \hat{\beta}\rightarrow_{as}N(\beta,\frac{\sigma^{2}Q^{-1}}{n})$

(2.107)

so $V_{as}(\hat{\beta})$ is approached as $\frac{\sigma^{2}Q^{-1}}{n}$ .

Asymptotic efficiency A sufficient condition for a consistent asymptotically normal estimator vector to be asymptotically efficient is that its asymptotic variance-covariance matrix equals the asymptotic Cramer-Rao lower bound (see Theil (1971)), which can be expressed as:

$\displaystyle \frac{1}{n}(I_{\infty})^{-1}=\frac{1}{n}\left[\lim_{n\rightarrow\infty}(\frac{I_{n}(\theta)}{n})\right]^{-1}$ (2.108)

where $I_{\infty}$ denotes the so-called asymptotic information matrix, while $I_{n}$ is the previously described sample information matrix (or simply, information matrix). The elements of $I_{\infty}$ are:

$\displaystyle I_{\infty}=\lim_{n\rightarrow\infty}(\frac{I_{n}(\theta)}{n})= \b... ...gin{pmatrix}\frac{Q}{\sigma^{2}} & 0 \\ 0 & \frac{1}{2\sigma^{4}} \end{pmatrix}$ (2.109)

and so,

$\displaystyle \frac{1}{n}(I_{\infty})^{-1}= \begin{pmatrix}\frac{\sigma^{2}Q^{-1}}{n} & 0 \\ 0 & \frac{2\sigma^{4}}{n} \end{pmatrix}$ (2.110)

From the last expression we deduce that the variance-covariance matrix of $\hat{\beta}$ (or $\tilde{\beta}$ ) equals the asymptotic Cramer Rao lower bound (element (1,1) of (2.110)), so we conclude that $\hat{\beta}$ (or $\tilde{\beta}$ ) is an asymptotically efficient estimator vector for the parameter vector $\beta$ .
Finally, we should note that the finite sample efficiency implies asymptotic efficiency, and we could have used this fact to conclude the asymptotic efficiency of $\hat{\beta}$ (or $\tilde{\beta}$ ), given the results of subsection about their finite sample properties.

2.4.4 Asymptotic Properties of the OLS and ML Estimators of $\sigma ^{2}$

Asymptotic unbiasedness. The OLS estimator of $\sigma ^{2}$ satisfies the finite sample unbiasedness property, according to result (2.86), so we deduce that it is asymptotically unbiased.
With respect to the ML estimator of $\sigma ^{2}$ , which does not satisfy the finite sample unbiasedness (result (2.87)), we must calculate its asymptotic expectation. On the basis of the first definition of asymptotic unbiasedness, presented in (2.96), we have:

$\displaystyle \lim_{n\rightarrow\infty}\textrm{E}(\tilde{\sigma}^{2})=\lim_{n\r... ...rrow\infty}\sigma^{2}-\lim_{n\rightarrow\infty}\frac{\sigma^{2}k}{n}=\sigma^{2}$ (2.111)

so we conclude that $\tilde{\sigma}^{2}$ is asymptotically unbiased.
Consistency. In order to show that $\hat{\sigma}^{2}$ and $\tilde{\sigma}^{2}$ are consistent, and given that both are asymptotically unbiased, the only sufficient condition that we have to prove is that the limit of their variances is null. From (2.88) and (2.89) we have:

$\displaystyle \lim_{n\rightarrow\infty}\frac{2\sigma^{4}}{n-k}=0$ (2.112)

and

$\displaystyle \lim_{n\rightarrow\infty}\frac{2\sigma^{4}(n-k)}{n^{2}}=0$ (2.113)

so both estimators satisfy the requirements of consistency.

Finally, the study of the asymptotic efficiency property requires approaching the asymptotic variance-covariance of the estimators. Following Fomby, Carter, and Johnson (1984) we have,

$\displaystyle \sqrt{n}(\hat{\sigma}^{2}-\sigma^{2})\rightarrow_{d}N(0,2\sigma^{4})$

(2.114)

so the limit distribution of $\hat{\sigma}^{2}$ can be approached as

$\displaystyle \hat{\sigma}^{2}\rightarrow_{as}N(\sigma^{2},\frac{2\sigma^{4}}{n})$

(2.115)

and then we conclude that

$\displaystyle var_{as}(\hat{\sigma}^{2})=\frac{2\sigma^{4}}{n}$

(2.116)

Analogously, following Dhrymes (1974), the ML estimator $\tilde{\sigma}^{2}$ satisfies

$\displaystyle \sqrt{n}(\tilde{\sigma}^{2}-\sigma^{2})\rightarrow_{d}N(0,2\sigma^{4})$

(2.117)

so $var_{as}(\tilde{\sigma}^{2})$ has the same form as that given in (2.116).

The second way to approach the asymptotic variance (see (2.104) ), leads to the following expressions:

$\displaystyle var_{as}(\hat{\sigma}^{2})=\frac{1}{n}\lim_{n\rightarrow\infty}\t... ...igma}^{2}))^{2}= \frac{1}{n}\lim_{n\rightarrow\infty}n\frac{2\sigma^{4}}{n-k}=$

$\displaystyle \frac{1}{n}\lim_{n\rightarrow\infty}\frac{2\sigma^{4}}{\frac{n-k}... ...a^{4}}{\lim_{n\rightarrow\infty}(1-\frac{k}{n})}\right] =\frac{1}{n}2\sigma^{4}$

(2.118)

$\displaystyle var_{as}(\tilde{\sigma}^{2})= \frac{1}{n}\lim_{n\rightarrow\infty... ...ma^{4}-\lim_{n\rightarrow\infty}\frac{2\sigma^{4}k}{n}]= \frac{1}{n}2\sigma^{4}$

(2.119)

Asymptotic efficiency. On the basis of the asymptotic Cramer-Rao lower bound expressed in (2.108) and calculated in (2.110), we conclude that both $\hat{\sigma}^{2}$ and $\tilde{\sigma}^{2}$ are asymptotically efficient estimators of $\sigma ^{2}$ , so their asymptotic variances equal the asymptotic Cramer-Rao lower bound.

2.4.5 Example

As we have seen in the previous section, the quantlet 8785 gls allows us to estimate all the parameters of the MLRM. In addition, if we want to estimate the variance-covariance matrix of $\hat{\beta}$ , which is given by $\hat{\sigma}^{2}(X^{\top }X)^{-1}$ , we can use the following quantlet

XEGmlrm03.xpl