# 2.3 Estimation Procedures

Having specified the MLRM given in (2.1) or (2.2), the following econometric stage to carry out is the estimation, which consists of quantifying the parameters of the model, using the observations of and collected in the sample of size . The set of parameters to estimate is : the k coefficients of the vector , and the dispersion parameter , about which we have no a priori information.

Following the same scheme of the previous chapter, we are going to describe the two common estimation procedures: the Least Squares (LS) and the Maximum Likelihood (ML) Methods.

## 2.3.1 The Least Squares Estimation

The LS procedure selects those values of that minimize the sum of squares of the distances between the actual values of and the adjusted (or estimated) values of the endogenous variable. Let be a possible estimation (some function of the sample observations) of . Then, the adjusted value of the endogenous variable is given by:

 (2.17)

where is the row vector of the value of the regressors for the observation. From (2.17), the distance defined earlier, or residual, is given by:

 (2.18)

Consequently, the function to minimize is :

 (2.19)

and then, what we call the Ordinary Least Squares (OLS) estimator of , denoted by is the value of which satisfies:

 (2.20)

To solve this optimization problem, the first-order conditions make the first derivatives of with respect to , ,..., equal to zero. In order to obtain such conditions in matrix form, we express (2.19) as follows:

 (2.21)

Given that , and both elements are , we can group the terms, and is written as follows:

 (2.22)

The vector which contains the first partial derivatives (gradient vector) is expressed as:

 (2.23)

Setting (2.23) to zero, result:

 (2.24)

The system of linear equations (2.24) is called the .

From assumption 2 of the last section, we know that has full rank, and so we can state that the inverse of exists, in such a way that we can obtain by premultiplying (2.24) by :

 (2.25)

According to (2.25), the OLS residuals vector is given by:

 (2.26)

with a typical element . From (2.2), the residual vector can be understood as the sample counterpart of the disturbance vector .

The second-order condition of minimization establishes that the second partial derivatives matrix (hessian matrix) has to be positive definite. In our case, such a matrix is given by:

 (2.27)

and given the full rank of , it means that is positive definite.

From (2.25), and given that the regressors are fixed, it follows that is a linear function of the vector , that is to say:

 (2.28)

where is a matrix of constant elements. The set of normal equations written in (2.24), can be expressed in the following terms:

resulting in:

 (2.29)

where all the sums are calculated from 1 to .

Thus, the equations which allow us to obtain the unknown coefficients are the following:

 (2.30)

From (2.30) we derive some algebraic properties of the OLS method:

a.
The sum of the residuals is null. To show this, if we evaluate the general expression (2.17) at the OLS estimate and we calculate , we obtain:

The right-hand side of the last expression is equal the right-hand side of the first equation of the system (2.30), so we can write:

 (2.31)

Using (2.31) and (2.18), it is proved that the residuals satisfy:

 (2.32)

b.
The regression hyperplane passes through the point of means of the data. From (2.17), the expression of this hyperplane is:

 (2.33)

Adding up the terms of (2.33) and dividing by , we obtain:

and given (2.31), it is obvious that , and then, we have the earlier stated property, since

c.
The residuals and the regressors are not correlated; this fact is mimicking the population property of independence between every and . To show this property, we calculate the sample covariance between residuals and regressors:

with . The last expression can be written in matrix form as:

where the last term uses the result (2.24).

Note that the algebraic property is always satisfied, while the properties and might not be maintained if the model has not intercept. This exception can be easily shown, because the first equation in (2.30) disappears when there is no constant term.

With respect to the OLS estimation of , we must note that it is not obtained as a result of the minimization problem, but is derived to satisfy two requirements: to use the OLS residuals (), and to be unbiased. Generalizing the result of the previous chapter, we have:

 (2.34)

An alternative way of obtaining the OLS estimates of the coefficients consists of expressing the variables in deviations with respect to their means; in this case, it can be proved that the value of the estimators and the residuals are the same as that of the previous results. Suppose we have estimated the model, so that we can write it as:

 (2.35)

with . Adding up both sides of (2.35) and dividing by , we have:

 (2.36)

To obtain the last result, we have employed result (2.32). Then, we calculate (2.35) minus (2.36), leading to the following result:

 (2.37)

This model called differs from (2.35) in two aspects: the intercept does not explicitly appear in the equation model, and all variables are expressed in deviations from their means.

Nevertheless, researchers are usually interested in evaluating the effect of the explanatory variables on the endogenous variable, so the intercept value is not the main interest, and so, specification (2.37) contains the relevant elements. In spite of this, we can evaluate the intercept later from (2.36), in the following terms:

 (2.38)

This approach can be formalized in matrix form, writing (2.35) as:

 (2.39)

Consider a partitioned matrix, and a partitioned vector, where denotes the matrix whose columns are the observations of each regressor, is an vector of ones, and is the vector of all the estimated coefficients except the intercept.

Let be an square matrix of the form:

 (2.40)

with the identity matrix. If we premultiply a given matrix (or vector) by , the elements of such a matrix (or vector) are transformed in deviations with respect their means. Moreover, we have . If we premultiply the matrix by the model (2.39), and since (from result (2.32)), we have:

 (2.41)

This last expression is the matrix form of (2.37). Now, we premultiply (2.41) by , obtaining:

 (2.42)

Given that is an idempotent matrix (i.e., ), such a property allows us to write (2.42) as:

and taking advantage of the fact that is also a symmetric matrix (i.e., ), we can rewrite the last expression as follows:

 (2.43)

or equivalently,

 (2.44)

with , that is to say, is the matrix whose columns are the observations of each regressor, evaluated in deviations. In a similar way, , that is to say, the observed endogenous variable in deviations with respect to its mean.

The system of equations given by (2.44) leads to the same value of as that obtained from (2.24). The only difference between the two systems is due to the intercept, which is estimated from (2.24), but not from (2.44). Nevertheless, as we have mentioned earlier, once we have the values of from (2.44), we can calculate through (2.38). Furthermore, according to (2.41), the residuals vector is the same as that obtained from (2.24), so the estimate of is that established in (2.34).

## 2.3.2 The Maximum Likelihood Estimation

Assumption 6 about the normality of the disturbances allows us to apply the maximum likelihood (ML) criterion to obtain the values of the unknown parameters of the MLRM. This method consists of the maximization of the likelihood function, and the values of and which maximize such a function are the ML estimates. To obtain the likelihood function, we start by considering the joint density function of the sample, which establishes the probability of a sample being realized, when the parameters are known. Firstly, we consider a general framework. Let be a random vector which is independently distributed as an n-multivariate normal , with expectations vector , and variance-covariance matrix . The probability density function of is given by:

 (2.45)

Usually, we observe only one sample, so if we substitute by an observed value , the function ) gives, for every value of , the probability of obtaining such a sample value (). Therefore, if the role of and is changed, in such a way that is fixed and the parameters vary, we obtain the so called , which can be written as:

 (2.46)

In the framework of the MLRM, the set of classical assumptions stated for the random component allowed us to conclude that the vector is distributed as an n-multivariate normal, with being the vector of means , and the variance-covariance matrix (results (2.15) and (2.16)). From (2.45) and (2.46), the likelihood function is:

 (2.47)

The ML method maximizes (2.47) in order to obtain the ML estimators of and .

In general, the way of deriving the likelihood function (2.47) is based on the relationship between the probability distribution of and that of . Suppose and are two random vectors, where with being a monotonic function. If we know the probability density function of , denoted by , we can obtain the probability density function of as follows:

 (2.48)

with being the Jacobian, which is defined as the absolute value of the determinant of the matrix of partial derivatives:

In our case, we identify with , and with , in such a way that it is easy to show that the jacobian is the identity matrix, so expression (2.48) leads to the same result as (2.47).

Although the ML method maximizes the likelihood function, it is usually simpler to work with the log of this function. Since the logarithm is a monotonic function, the parameter values that maximize L are the same as those that maximize the log-likelihood (). In our case, has the following form:

 (2.49)

The ML estimators are the solution to the first-order conditions:

 (2.50)

 (2.51)

Thus, the ML estimators, denoted by and , are:

 (2.52)

 (2.53)

As we can see, similarly to results in the univariate linear regression model of Chapter 2, under the assumption of normality of the disturbances, both ML and LS methods gives the same estimated value for the coefficients ( ), and thus, the numerator of the expression of is the same as that of .

## 2.3.3 Example

All estimation quantlets in the stats quantlib have as input parameters:

x
An matrix containing observations of explanatory variables ,
y
An vector containing the observed responses.
Neither the matrix X, nor the vector y should contain missing values (NaN) or infinite values (Inf,-Inf).

In the following example, we will use Spanish economic data to illustrate the MLRM estimation. The file data.dat contains quarterly data from 1980 to 1997 (sample size ) for the variables consumption, exports and M1 (monetary supply). All variables are expressed in constant prices of 1995.

Descriptive statistics of the three variables which are included in the consumption function can be found in the Table 2.1.

Table 2.1: Descriptive statistics for consumption data.
 Min Max Mean S.D. consumption 7558200 12103000 9524600 1328800 exports 1439000 5590700 2778500 1017700 M1 9203.9 18811 13512 3140.8

On the basis of the information on the data file, we estimate the consumption function; the endogenous variable we want to explain is consumption, while exportations and M1 are the explanatory variables, or regressors.

The quantlet XEGmlrm01.xpl produces some summary statistics

#### 2.3.3.0.1 Computing MLRM Estimates

The quantlet in the stats quantlib which can be employed to obtain only the OLS (or ML) estimation of the coefficients and is gls .

 b = gls ( x, y ) estimates the parameters of a MLRM

In XEGmlrm02.xpl, we have used the quantlet gls to compute the OLS estimates of (b), and both the OLS and ML estimates of (sigls and sigml).