# 2.4 Properties of the Estimators

When we want to study the properties of the obtained estimators, it is convenient to distinguish between two categories of properties: i) the small (or finite) sample properties, which are valid whatever the sample size, and ii) the asymptotic properties, which are associated with large samples, i.e., when tends to .

## 2.4.1 Finite Sample Properties of the OLS and ML Estimates of

Given that, as we obtained in the previous section, the OLS and ML estimates of lead to the same result, the following properties refer to both. In order to derive these properties, and on the basis of the classical assumptions, the vector of estimated coefficients can be written in the following alternative form:

 (2.54)

• Unbiasedness. According to the concept of unbiasedness, vector is an unbiased estimator vector of since:

 (2.55)

The unbiasedness property of the estimators means that, if we have many samples for the random variable and we calculate the estimated value corresponding to each sample, the average of these estimated values approaches the unknown parameter. Nevertheless, we usually have only one sample (i.e, one realization of the random variable), so we can not assure anything about the distance between and . This fact leads us to employ the concept of variance, or the variance-covariance matrix if we have a vector of estimates. This concept measures the average distance between the estimated value obtained from the only sample we have and its expected value.

From the previous argument we can deduce that, although the unbiasedness property is not sufficient in itself, it is the minimum requirement to be satisfied by an estimator.

The variance-covariance matrix of has the following expression:

 (2.56)

with the elements of this matrix meaning:

 (2.57)

 (2.58)

Obviously, (2.56) is a symmetric positive definite matrix.

The consideration of allows us to define efficiency as a second finite sample property.

• Efficiency. An estimator is efficient if it is the minimum variance unbiased estimator. The Cramer Rao inequality provides verification of efficiency, since it establishes the lower bound for the variance-covariance matrix of any unbiased estimator. This lower bound is given by the corresponding element of the diagonal of the inverse of the information matrix (or sample information matrix) , which is defined as:

 (2.59)

where denotes the hessian matrix, i.e., the matrix of the second partial derivatives of the log-likelihood function .

In order to study the efficiency property for the OLS and ML estimates of , we begin by defining , and the hessian matrix is expressed as a partitioned matrix of the form:

 (2.60)

where is a square matrix, and are vectors, and is a element.

From (2.50) and (2.51), we have:

 (2.61)

Thus, the sample information matrix is:

 (2.62)

and its inverse,

 (2.63)

Following the Cramer-Rao inequality, constitutes the lower bound for the variance-covariance matrix of any unbiased estimator vector of the parameter vector , while is the corresponding bound for the variance of an unbiased estimator of .

According to (2.56), we can conclude that (or ), satisfies the efficiency property, given that their variance-covariance matrix coincides with .

A property which is less strict than efficiency, is the so called best, linear unbiased estimator (BLUE) property, which also uses the variance of the estimators.
• BLUE. A vector of estimators is BLUE if it is the minimum variance linear unbiased estimator. To show this property, we use the Gauss-Markov Theorem. In the MLRM framework, this theorem provides a general expression for the variance-covariance matrix of a linear unbiased vector of estimators. Then, the comparison of this matrix with the corresponding matrix of allows us to conclude that (or ) is BLUE.

With this aim, we define as a family of linear vectors of estimates of the parameter vector :

 (2.64)

with being a matrix of constant elements, where:

 (2.65)

In order to assure the unbiasedness of , we suppose , and then (2.64) can be written as:

 (2.66)

From this last expression we can derive the variance-covariance matrix of :

 (2.67)

Taking into account (2.65) we have:

 (2.68)

and the unbiasedness condition allows us to show that :

 (2.69)

and given that , as was established in (2.28), we derive that . By substituting this result into the last term of (2.69), it must hold that , which implies that:

and obviously, . We now take expression (2.67), which we can write as:

 (2.70)

and given that , according to (2.56), we have:

 (2.71)

or

 (2.72)

A general result matrix establishes that given any matrix P, then is a positive semidefinite matrix, so we can conclude that is positive semidefinite. This property means that the elements of its diagonal are non negative, so we deduce for every coefficient:

 (2.73)

that is to say, we conclude that the OLS or ML estimator vector of satisfies the Gauss-Markov theorem, and this implies that (or ) is BLUE.

The set of results we have previously obtained, allows us to know the probability distribution for (or ). Given that these estimator vectors are linear with respect to the vector, and having a normal distribution, then:

 (2.74)

## 2.4.2 Finite Sample Properties of the OLS and ML Estimates of

According to expressions (2.34) and (2.53), the OLS and ML estimators of are different, despite both being constructed through . In order to obtain their properties, it is convenient to express as a function of the disturbance of the model. From the definition of in (2.26) we obtain:

 (2.75)

with a non-stochastic square matrix, which is symmetric, idempotent and whose rank and trace are . In addition, fulfils .

Result (2.75), which means that is linear with respect to , can be extended in the following way:

 (2.76)

that is to say, there is also a linear relation between and .

From (2.76), and under the earlier mentioned properties of , the sum of squared residuals can be written as a quadratic form of the disturbance vector,

 (2.77)

Since every element of has a N(0, ) distribution, and is an idempotent matrix, then follows a chi-squared distribution with degrees of freedom equal to the rank of , that is to say:

 (2.78)

Note that from (2.75), it is also possible to write as a quadratic form of , yielding:

 (2.79)

This expression for allows us to obtain a very simple way to calculate the OLS or ML estimator of . For example, for :

 (2.80)

Having established these relations of interest, we now define the properties of and :

• Linearity. According to (2.79) the OLS and ML estimators of are expressed as:

and

so both are non linear with respect to , given that their numerators are quadratic forms of .

• Unbiasedness. In order to show this property, we use (2.77), to obtain:

 (2.81)

 (2.82)

If we first consider , we must calculate:

 (2.83)

The calculation of requires using the distribution (2.78), in such a way that, given that a chi-square variable has expected value equal to the corresponding degree of freedom, we have:

 (2.84)

and then,

 (2.85)

which allows us to obtain:

 (2.86)

In a similar way, we obtain :

 (2.87)

so we conclude that is an unbiased estimator for , while is biased.
In order to analyze efficiency and BLUE properties, we must know the variance of and . From (2.78), we have , because the variance of a chi-square variable is two times its degrees of freedom. This result leads to the following expressions for the variances:

 (2.88)

 (2.89)

Nevertheless, given that is biased, this estimator can not be efficient, so we focus on the study of such a property for . With respect to the BLUE property, neither nor are linear, so they can not be BLUE.

• Efficiency. The comparison of the variance of (expression (2.88)) with element of the matrix (expression (2.63)) allows us to deduce that this estimator does not satisfy the Cramer-Rao inequality, given that . Nevertheless, as Schmidt (1976) shows, there is no unbiased estimator of with a smaller variance, so it can be said that is an efficient estimator.
The variance-covariance matrix of an estimator vector could tell us how accurate it is. However, this matrix, which was obtained in (2.56), depends on the unknown parameter, so we can obtain an unbiased estimation of it by substituting for its unbiased estimator :

 (2.90)

The meaning of every element of this matrix is analogous to that presented in (2.57) and (2.58).

## 2.4.3 Asymptotic Properties of the OLS and ML Estimators of

Finite sample properties try to study the behavior of an estimator under the assumption of having many samples, and consequently many estimators of the parameter of interest. Thus, the average of these estimators should approach the parameter value (unbiasedness) or the average distance to the parameter value should be the smallest possible (efficiency). However, in practice we have only one sample, and the asymptotic properties are established by keeping this fact in mind but assuming that the sample is large enough.

Specifically, the asymptotic properties study the behavior of the estimators as increases; in this sense, an estimator which is calculated for different sample sizes can be understood as a sequence of random variables indexed by the sample sizes (for example, ). Two relevant aspects to analyze in this sequence are and .

A sequence of random variables is said to a constant or to another random variable , if

 (2.91)

or

 (2.92)

where denotes probability and is an arbitrary constant. Equivalently, we can express this convergence as:

or

 (2.93)

Result (2.91) implies that all the probability of the distribution becomes concentrated at points close to . Result (2.92) implies that the values that the variable may take that are not far from z become more probable as increases, and moreover, this probability tends to one.

A second form of convergence is convergence in distribution. If is a sequence of random variables with cumulative distribution function () , then the sequence to a variable with if

 (2.94)

which can be denoted by:

 (2.95)

and is said to be the of .

Having established these preliminary concepts, we now consider the following desirable asymptotic properties : asymptotic unbiasedness, consistency and asymptotic efficiency.

• Asymptotic unbiasedness. There are two alternative definitions of this concept. The first states that an estimator is asymptotically unbiased if as n increases, the sequence of its first moments converges to the parameter . It can be expressed as:

 (2.96)

Note that the second part of (2.96) also means that the possible bias of disappears as increases, so we can deduce that an unbiased estimator is also an asymptotic unbiased estimator.

The second definition is based on the convergence in distribution of a sequence of random variables. According to this definition, an estimator is asymptotically unbiased if its asymptotic expectation, or expectation of its limit distribution, is the parameter . It is expressed as follows:

 (2.97)

Since this second definition requires knowing the limit distribution of the sequence of random variables, and this is not always easy to know, the first definition is very often used.

In our case, since and are unbiased, it follows that they are asymptotically unbiased:

 (2.98)

In order to simplify notation, in what follows we will use , instead of . Nevertheless, we must continue considering it as a sequence of random variables indexed by the sample size.
• Consistency. An estimator is said to be consistent if it converges in probability to the unknown parameter, that is to say:

 (2.99)

which, in view of (2.91), means that a consistent estimator satisfies the convergence in probability to a constant, with the unknown parameter being such a constant.

The simplest way of showing consistency consists of proving two sufficient conditions: i) the estimator must be asymptotically unbiased, and ii) its variance must converge to zero as n increases. These conditions are derived from the convergence in quadratic mean (or convergence in second moments), given that this concept of convergence implies convergence in probability (for a detailed study of the several modes of convergence and their relations, see Amemiya (1985), Spanos (1986) and White (1984)).

In our case, since the asymptotic unbiasedness of and has been shown earlier, we only have to prove the second condition. In this sense, we calculate:

 (2.100)

Multiplying and dividing (2.100) by , we obtain:

 (2.101)

where we have used the condition (2.6) included in assumption 1. Thus, result (2.101) proves the consistency of the OLS and ML estimators of the coefficient vector. As we mentioned before, this means that all the probability of the distribution of (or ) becomes concentrated at points close to , as increases.

Consistency might be thought of as the minimum requirement for a useful estimator. However, given that there can be many consistent estimators of a parameter, it is convenient to consider another property such as asymptotic efficiency. This property focuses on the asymptotic variance of the estimators or asymptotic variance-covariance matrix of an estimator vector. Similar to asymptotic unbiasedness, two definitions of this concept can be found. The first of them defines it as the variance of the limit distribution of the estimator. Obviously, it is necessary to know this limit distribution. However, according to the meaning of consistency, the limit distribution of a consistent estimator is degenerated at a point, so its variance is zero. In order to obtain an approach to the limit distribution, we can use a (CLT), which establishes the conditions to guaranty that the limit distribution is a normal distribution.

Suppose we have applied a CLT, and we have:

 (2.102)

with , that is to say, is the asymptotic variance of . This result allows us to approach the limit distribution of as:

 (2.103)

where denotes "asymptotically distributed as", and consequently the asymptotic variance of the estimator is approached by .

The second definition of asymptotic variance, which does not require using any limit distribution, is obtained as:

 (2.104)

In our framework, this second definition leads us to express the asymptotic variance of vector as:

 (2.105)

If we consider the first approach of the asymptotic variance, the use of a CLT (see Judge, Carter, Griffiths, Lutkepohl and Lee (1988)) yields:

 (2.106)

 (2.107)

so is approached as .
• Asymptotic efficiency A sufficient condition for a consistent asymptotically normal estimator vector to be asymptotically efficient is that its asymptotic variance-covariance matrix equals the asymptotic Cramer-Rao lower bound (see Theil (1971)), which can be expressed as:

 (2.108)

where denotes the so-called asymptotic information matrix, while is the previously described sample information matrix (or simply, information matrix). The elements of are:

 (2.109)

and so,

 (2.110)

From the last expression we deduce that the variance-covariance matrix of (or ) equals the asymptotic Cramer Rao lower bound (element (1,1) of (2.110)), so we conclude that (or ) is an asymptotically efficient estimator vector for the parameter vector .

Finally, we should note that the finite sample efficiency implies asymptotic efficiency, and we could have used this fact to conclude the asymptotic efficiency of (or ), given the results of subsection about their finite sample properties.

## 2.4.4 Asymptotic Properties of the OLS and ML Estimators of

• Asymptotic unbiasedness. The OLS estimator of satisfies the finite sample unbiasedness property, according to result (2.86), so we deduce that it is asymptotically unbiased.

With respect to the ML estimator of , which does not satisfy the finite sample unbiasedness (result (2.87)), we must calculate its asymptotic expectation. On the basis of the first definition of asymptotic unbiasedness, presented in (2.96), we have:

 (2.111)

so we conclude that is asymptotically unbiased.

• Consistency. In order to show that and are consistent, and given that both are asymptotically unbiased, the only sufficient condition that we have to prove is that the limit of their variances is null. From (2.88) and (2.89) we have:

 (2.112)

and

 (2.113)

so both estimators satisfy the requirements of consistency.
Finally, the study of the asymptotic efficiency property requires approaching the asymptotic variance-covariance of the estimators. Following Fomby, Carter, and Johnson (1984) we have,

 (2.114)

so the limit distribution of can be approached as

 (2.115)

and then we conclude that

 (2.116)

Analogously, following Dhrymes (1974), the ML estimator satisfies

 (2.117)

so has the same form as that given in (2.116).

The second way to approach the asymptotic variance (see (2.104) ), leads to the following expressions:

 (2.118)

 (2.119)

• Asymptotic efficiency. On the basis of the asymptotic Cramer-Rao lower bound expressed in (2.108) and calculated in (2.110), we conclude that both and are asymptotically efficient estimators of , so their asymptotic variances equal the asymptotic Cramer-Rao lower bound.

## 2.4.5 Example

As we have seen in the previous section, the quantlet gls allows us to estimate all the parameters of the MLRM. In addition, if we want to estimate the variance-covariance matrix of , which is given by , we can use the following quantlet