2.8 Restricted and Unrestricted Regression

In previous sections we made use of the LS and ML principles to derive estimators of the unknown parameters of the MLRM. In using these principles, we assumed that our information level was only the sample information, so it was considered there was no a priori information on the parameters of the model. However, in some situations it is possible to have some non-sample information (a priori information on the parameters), which can be of several kinds. Now we focus only on exact a priori information about $\beta$ coefficients (useful references for this topic are Fomby, Carter, and Johnson (1984) and Judge, Griffiths, Carter, Lutkepohl and Lee (1985)).

In general, this previous information on the coefficients can be expressed as follows:

$\displaystyle R\beta=r$

(2.153)

where

and

are the matrix and the vector which was defined to establish the test given in (2.134). Now (2.153) can be thought of as the way of expressing the a priori information about the elements of the $\beta$ vector.

In this section, our objective consists of estimating the parameters of the MLRM by considering the a priori information. Basically, there are two equivalent ways of carrying out such an estimation. One of them consists of incorporating the a priori information into the specified model, in such a way that a transformed model is obtained whose unknown parameters are estimated by OLS or ML. The other way of operating consists of applying either what we call the restricted least squares (RLS) method, or what we call the restricted maximum likelihood (RML) method.

2.8.1 Restricted Least Squares and Restricted Maximum Likelihood Estimators

Given the MLRM

$\displaystyle y=X\beta+u$

and the a priori information about $\beta$ expressed as $R\beta=r$ , we try to find the vector $\hat{\hat{\beta}}_{R}$ which minimizes the squared sum of residuals (if we use the LS method) or maximizes the likelihood function (in the case of the ML method), subject to $R\hat{\hat{\beta}}_{R}=r$ . Then, the estimator which we obtain by combining all the information is called $\textsl{Restricted Least Squares}$ or $\textsl{Restricted Maximum Likelihood}$ , respectively.

The conditioned optimization problem can be solved through the classical Lagrangian procedures. If we first consider the LS method, the corresponding Lagrange function is:

$\displaystyle \Im=(y-X\hat{\hat{\beta}}_{R})^{\top }(y-X\hat{\hat{\beta}}_{R})-2\lambda^{\top }(R\hat{\hat{\beta}}_{R}-r)=$

$\displaystyle y^{\top }y-2\hat{\hat{\beta}}_{R}^{\top }X^{\top }y+\hat{\hat{\be... ...top }X^{\top }X\hat{\hat{\beta}}_{R}-2\lambda^{\top }(R\hat{\hat{\beta}}_{R}-r)$

(2.154)

where $\lambda$ is the $q \times 1$ vector of Lagrange multipliers. The 2 in the last term appears to make the derivation easier and does not affect the outcome.

To determine the optimum values, we set the partial derivatives of $\Im$ with respect to $\beta$ and $\lambda$ equal to zero:

$\displaystyle \frac{\partial\Im}{\partial\hat{\hat{\beta}}_{R}}=-2X^{\top }y+2X^{\top }X\hat{\hat{\beta}}_{R}-2R^{\top }\lambda=0$

(2.155)

$\displaystyle \frac{\partial\Im}{\partial\lambda}=-2(R\hat{\hat{\beta}}_{R}-r)=0$

(2.156)

We substitute $\hat{\hat{\beta}}_{R}$ by $\hat{\beta}_{R}$ in order to obtain the value of $\hat{\hat{\beta}}_{R}$ which satisfies the first-order conditions. Then, from (2.155) we have:

$\displaystyle X^{\top }X\hat{\beta}_{R}=X^{\top }y+R^{\top }\hat{\lambda}$

(2.157)

In premultiplying the last expression by $(X^{\top }X)^{-1}$ we get:

$\displaystyle (X^{\top }X)^{-1}X^{\top }X\hat{\beta}_{R}=(X^{\top }X)^{-1}(X^{\top }y+R^{\top }\hat{\lambda})$

$\displaystyle \Rightarrow \hat{\beta}_{R}=\hat{\beta}+(X^{\top }X)^{-1}R^{\top }\hat{\lambda}$

(2.158)

where $\hat{\beta}$ is the unrestricted least squares estimator which was obtained in (2.25).

Expression (2.158) is premultiplied by and we get:

$\displaystyle R\hat{\beta}_{R}=R\hat{\beta}+R(X^{\top }X)^{-1}R^{\top }\hat{\lambda}$

(2.159)

Since $(X^{\top }X)^{-1}$ is a positive definite matrix, $R(X^{\top }X)^{-1}R^{\top }$ is also positive definite, and moreover, its rank is $q \leq k$ and it is nonsingular. Then, from (2.159) we may obtain:

$\displaystyle \hat{\lambda}=[R(X^{\top }X)^{-1}R^{\top }]^{-1}(R\hat{\beta}_{R}-R\hat{\beta})$

$\displaystyle \hat{\lambda}=[R(X^{\top }X)^{-1}R^{\top }]^{-1}(r-R\hat{\beta})$

(2.160)

because from (2.156), the restricted minimization problem must satisfy the side condition $R\hat{\beta}_{R}=r$ . Using the value (2.160) for the vector $\lambda$ , we get from (2.158) the estimator:

$\displaystyle \hat{\beta}_{R}=\hat{\beta}+(X^{\top }X)^{-1}R^{\top }[R(X^{\top }X)^{-1}R^{\top }]^{-1}(r-R\hat{\beta})$

(2.161)

which is denoted as the restricted least squares (RLS) estimator.

Given that $(X^{\top }X)^{-1}R^{\top }[R(X^{\top }X)^{-1}R^{\top }]^{-1}$ is a matrix of constant elements, from (2.161) we can see that the difference between $\hat{\beta}_{R}$ and $\hat{\beta}$ is a linear function of the $(r-R\hat{\beta})$ vector. Moreover, we deduce that this difference increases the further $\hat{\beta}$ (unrestricted LS) is from satisfying the restriction.

According to the RLS estimator, the residuals vector can be defined as:

$\displaystyle \hat{u}_{R}=y-X\hat{\beta}_{R}$

(2.162)

and, analogously to the procedure followed to obtain $\hat{\sigma}^{2}$ , the RLS estimator of $\sigma ^{2}$ is given by:

$\displaystyle \hat{\sigma}^{2}_{R}=\frac{\hat{u}_{R}^{\top }\hat{u}_{R}}{n-k+q}$

(2.163)

which is an unbiased estimator of $\sigma ^{2}$ , given that $\textrm{E}(\hat{u}_{R}^{\top }\hat{u}_{R})=\sigma^{2}(n-(k-q))$ .

Having obtained the expressions of the RLS estimators of the parameters in the MLRM, we now have the required information in order to prove the equivalence between (2.139) and (2.140), established in the previous section. In order to show such equivalence, we begin by adding and subtracting $X\hat{\beta}$ to and from (2.162):

$\displaystyle \hat{u}_{R}=y-X\hat{\beta}_{R}+X\hat{\beta}-X\hat{\beta}=(y-X\hat{\beta})-X(\hat{\beta}_{R}-\hat{\beta})= \hat{u}-X(\hat{\beta}_{R}-\hat{\beta})$

(2.164)

and then, given that $X^{\top }\hat{u}=\hat{u}^{\top }X=0$ (an algebraic property of the LS method, described in the estimation section), we have:

$\displaystyle \hat{u}_{R}^{\top }\hat{u}_{R}=[\hat{u}-X(\hat{\beta}_{R}-\hat{\b... ...u}+(\hat{\beta}_{R}-\hat{\beta})^{\top }X^{\top }X(\hat{\beta}_{R}-\hat{\beta})$

(2.165)

From (2.165), we can write:

$\displaystyle \hat{u}_{R}^{\top }\hat{u}_{R}-\hat{u}^{\top }\hat{u}=(\hat{\beta}_{R}-\hat{\beta})^{\top }X^{\top }X(\hat{\beta}_{R}-\hat{\beta})$

(2.166)

and if we substitute $(\hat{\beta}_{R}-\hat{\beta})$ according to (2.161), we have

$\displaystyle \hat{u}_{R}^{\top }\hat{u}_{R}-\hat{u}^{\top }\hat{u}=(r-R\hat{\beta})^{\top }[R(X^{\top }X)^{-1}R^{\top }]^{-1}(r-R\hat{\beta})$

(2.167)

This last expression allows us to conclude that (2.139) and (2.140) are equivalent. Additionally, from (2.167) it is satisfied that $\hat{u}_{R}^{\top }\hat{u}_{R}>\hat{u}^{\top }\hat{u}$ , given that $[R(X^{\top }X)^{-1}R^{\top }]^{-1}$ is a positive definite matrix.

In order to derive now the RML estimators, the Lagrange function according to the ML principle is written as:

$\displaystyle \Im =-\frac{n}{2}\ln2\pi-\frac{n}{2}\ln\sigma^{2}_{R}- \frac{(y-X\beta_{R})^{\top }(y-X\beta_{R})}{2\sigma^{2}_{R}}+2\lambda^{\top }(R\beta_{R}-r)$

(2.168)

and the first-order conditions are:

$\displaystyle \frac{\partial\Im}{\partial\beta_{R}}=-\frac{1}{2\sigma^{2}_{R}}(-2X^{\top }y+2X^{\top }X\beta_{R})+2R^{\top }\lambda=0$

(2.169)

$\displaystyle \frac{\partial\Im}{\partial\sigma^{2}_{R}}=-\frac{n}{2\sigma^{2}_{R}}+\frac{(y-X\beta_{R})^{\top }(y-X\beta_{R})}{2\sigma^{4}_{R}}=0$

(2.170)

$\displaystyle \frac{\partial\Im}{\partial\lambda}=-2(R\beta_{R}-r)=0$

(2.171)

From (2.169)-(2.171), and putting $\sim$ to the parameters, in a similar way to the RLS procedure, we deduce:

$\displaystyle \tilde{\beta}_{R}= \tilde{\beta}+(X^{\top }X)^{-1}R^{\top }[R(X^{\top }X)^{-1}R^{\top }]^{-1}(r-R\tilde{\beta})$

(2.172)

$\displaystyle \tilde{\sigma}^{2}_{R}=\frac{(y-X\tilde{\beta}_{R})^{\top }(y-X\tilde{\beta}_{R})}{n}=\frac{\hat{u}^{\top }_{R}\hat{u}_{R}}{n}$

(2.173)

$\displaystyle \tilde{\lambda}=\frac{[R(X^{\top }X)^{-1}R^{\top }]^{-1}(r-R\tilde{\beta})}{\tilde{\sigma}^{2}_{R}}$

(2.174)

so we conclude that, in a MLRM which satisfies the classical assumptions, the RLS estimators of the coefficients are the same as the RML estimators. This allows us to write the equality given in (2.173).

2.8.2 Finite Sample Properties of the Restricted Estimator Vector

Given the equality between $\hat{\beta}_{R}$ and $\tilde{\beta}_{R}$ , the following proofs are valid for both procedures.

Before deriving some properties, it is convenient to obtain the expectation vector and the variance-covariance matrix of the restricted estimator vector. Using (2.161), the expected value of $\hat{\beta}_{R}$ is :

$\displaystyle \textrm{E}(\hat{\beta}_{R})=\textrm{E}(\hat{\beta})+(X^{\top }X)^{-1}R^{\top }[R(X^{\top }X)^{-1}R^{\top }]^{-1}(r-RE(\hat{\beta}))=$

$\displaystyle \beta+(X^{\top }X)^{-1}R^{\top }[R(X^{\top }X)^{-1}R^{\top }]^{-1}(r-R\beta)$

(2.175)

and the variance-covariance matrix:

$\displaystyle V(\tilde{\beta}_{R})=\textrm{E}[(\hat{\beta}_{R}-\textrm{E} \hat{\beta}_{R})(\hat{\beta}_{R}-\textrm{E}\hat{\beta}_{R})^{\top }]$

If we consider expression (2.54) and it is substituted into (2.161), we can write:

$\displaystyle \hat{\beta}_{R}-\textrm{E} \hat{\beta}_{R}=(X^{\top }X)^{-1}X^{\t... ...X)^{-1}R^{\top }[R(X^{\top }X)^{-1}R^{\top }]^{-1}R(X^{\top }X)^{-1}X^{\top }u$

$\displaystyle =[I_{k}-(X^{\top }X)^{-1}R^{\top }[R(X^{\top }X)^{-1}R^{\top }]^{-1}R](X^{\top }X)^{-1}X^{\top }u$

$\displaystyle =\Phi(X^{\top }X)^{-1}X^{\top }u$

(2.176)

The $\Phi$ matrix (which premultiplies to $(X^{\top }X)^{-1}X^{\top }u$ in (2.176), is a $k \times k$ idempotent matrix of constant elements. From this last expression we obtain:

$\displaystyle V(\hat{\beta}_{R})=\textrm{E}[\Phi(X^{\top }X)^{-1}X^{\top }uu^{\top }X(X^{\top }X)^{-1}\Phi^{\top }]$

$\displaystyle =\sigma^{2}\Phi(X^{\top }X)^{-1}\Phi^{\top }=\sigma^{2}\Phi(X^{\top }X)^{-1}$

(2.177)

The last equality of (2.177) is written according to the proof presented in Judge, Carter, Griffiths, Lutkepohl and Lee (1988)

From (2.177), it is possible to deduce the relationship between $V(\hat{\beta}_{R})$ and $V(\hat{\beta})$ , by replacing $\Phi$ by its expression, and thus:

$\displaystyle V(\hat{\beta}_{R})=\sigma^{2}(X^{\top }X)^{-1}-\sigma^{2}(X^{\top }X)^{-1}R^{\top }[R(X^{\top }X)^{-1}R^{\top }]^{-1}R(X^{\top }X)^{-1}\Rightarrow$

$\displaystyle V(\hat{\beta}_{R})-V(\hat{\beta})=\sigma^{2}(X^{\top }X)^{-1}R^{\top }[R(X^{\top }X)^{-1}R^{\top }]^{-1}R(X^{\top }X)^{-1}=\sigma^{2}C$

with C being a positive semidefinite matrix, as Fomby, Carter, and Johnson (1984) show. Consequently, the diagonal elements of $V(\hat{\beta}_{R})$ (variances of each $\hat{\beta}_{Rj}$ ) are equal to or less than the corresponding elements of $V(\hat{\beta})$ (variances of each $\hat{\beta}_{j})$ . This means that, if the a priori information is correct (as we will show later , in this case $\textrm{E}(\hat{\beta}_{R})=\beta)$ , the estimator vector $\hat{\beta}_{R}$ is more efficient than the OLS estimator.

On the basis of these previous results, we can establish the finite properties of the RLS estimator.

Linearity. If we substitute expression (2.54) of $\hat{\beta}$ into (2.161), and then we group terms, we obtain:

$\displaystyle \hat{\beta}_{R}=(X^{\top }X)^{-1}R^{\top }[R(X^{\top }X)^{-1}R^{\top }]^{-1}r+$ (2.178)

$\displaystyle \left[(X^{\top }X)^{-1}X^{\top }-(X^{\top }X)^{-1}R^{\top }[R(X^{\top }X)^{-1}R^{\top }]^{-1}R(X^{\top }X)^{-1}X^{\top }\right]y$
Given that the first term of the right-hand side of (2.178) is a vector of constant elements, and the second term is the product of a matrix of constant elements multiplied by the vector , it follows that $\hat{\beta}_{R}$ satisfies the linearity property.
Unbiasedness. According to (2.175), $\hat{\beta}_{R}$ is unbiased only if $r-R\beta=0$ , that is to say, if the a priori information is true.
BLUE. With correct a priori information, the estimator vector $\hat{\beta}_{R}$ is the best linear unbiased vector within the class of unbiased estimators that are linear functions of the endogenous variable and that also satisfy the a priori information (2.153). This property, which we will not prove here, is based on the Gauss-Markov Theorem.
Efficiency. As Rothemberg (1973) shows, when the a priori information is correct, the estimator vector $\hat{\beta}_{R}$ satisfies the Cramer-Rao inequality, and consequently, it is efficient.

From (2.177) it can be deduced that the expression of $V(\hat{\beta}_{R})$ does not change if the a priori information is correct or non correct, which means that $V(\hat{\beta}_{R}) \leq V(\hat{\beta})$ is maintained whatever the situation.

Finally, given the linearity of $\hat{\beta}_{R}$ with respect to , and this vector being normally distributed, if the a priori information is correct, we have:

$\displaystyle \hat{\beta}_{R} \sim N(\beta, \sigma^{2}\Phi(X^{\top }X)^{-1})$

(2.179)

2.8.3 Example

We now consider that we have the following a priori information on the coefficients: $\beta_{2}=1$ , in such a way that we calculate the restricted estimators of the coefficients. Jointly with these estimators the quantlet XEGmlrm06.xpl computes the F statistic as a function of the restricted and unrestricted squared sum of the residuals which allows us to test $H_{0}:\beta_{2}=1$ .

XEGmlrm06.xpl

Note that the RML estimator satisfies the formulated restriction, and the value of the F statistic is the same as the one obtained in Section 2.7.4