If we have available a sample of observations from the
population represented by
,
,
and we assume the Population Regression Function is both linear in
variables and parameters
![]() |
(1.25) |
we can now face the task of estimating the unknown parameters
and
. Unfortunately, the sampling design and the
linearity assumption in the PRF, are not sufficient conditions to
ensure that there exists a precise statistical relationship
between the estimators and its true corresponding values (see
section 1.2.6 for more details). In order to do so, we need
to know some additional features from the PRF. Since we do not
them, we decide to establish some assumptions, making clear that
in any case, the statistical properties of the estimators are
going to depend crucially on the related assumptions. The basic
set of assumptions that comprises the classical linear regression
model is as follows:
Finally, an additional assumption that is usually employed to easier the inference is
For a more detailed explanation and comments on the different
assumption see Gujarati (1995). Assumption (A.1) is quite
strong, and it is in fact very difficult to accept when dealing
with economic data. However, most part of statistical results
obtained under this hypothesis hold as well for weaker such as
random but independent of
(see Amemiya (1985) for the
fixed design case, against Newey and McFadden (1994) for the random
design).
In the univariate linear regression setting that was introduced in the previous section the following parameters need to be estimated
Regression Estimation
From a given population described as
We show the scatter plot in Figure 1.4
Following the same reasoning as in the previous sections, the PRF
is unknown for the researcher, and he has only available the data,
and some information from the PRF. For example, he may know that
the relationship between and
is linear, but he does
not know which are the exact parameter values. In Figure 1.5
we represent the sample and several possible values of the
regression functions according to different values for
and
.
In order to estimate and
, many estimation
procedures are available. One of the most famous criteria is the
one that chooses
and
such that they minimize the
sum of the squared deviations of the regression values from their
real corresponding values. This is the so called least squares
method. Applying this procedure to the previous sample,
in Figure 1.6, we show for the sake of comparison the least squares regression curve together with the other sample regression curves.
We describe now in a more precise way how the Least Squares method is implemented, and, under a Population Regression Function that incorporates assumptions (A.1) to (A.6), which are its statistical properties.
We begin by establishing a formal estimation criteria. Let
and
be a possible
estimators (some function of the sample observations) of
and
. Then, the fitted value of the endogenous variable is:
The residual value between the real and the fitted value is given by
The least squares method minimizes the sum of squared deviations
of regression values
from the observed values
, that is,
the residual sum of squares-RSS.
This criterion function has two variables with respect to which we
are willing to minimize:
and
.
Then, we define as Ordinary Least Squares (OLS) estimators,
denoted by
and
, the values of
and
that solve the following optimization problem
In order to solve it, that is, to find the minimum values, the first conditions make the first partial derivatives have to be equal to zero.
To verify whether the solution is really a minimum, the matrix of second order derivatives of (1.32), the Hessian matrix, must be positive definite. It is easy to show that
The first derivatives (equal to zero) lead to the so-called (least squares) normal equations from which the estimated regression parameters can be computed by solving the equations.
Dividing the original equations by , we get a simplified
formula suitable for the computation of regression parameters
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
For the estimated intercept
, we get:
For the estimated linear slope coefficient , we get:
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
The ordinary least squares estimator of the parameter
is based on the following idea: Since
is the expected
value of
and
is an estimate of
, our initial
estimator
would seem to be a natural estimator of , but due to the
fact that
, this
implies
Therefore, the unbiased estimator of is
Now, with this expression, we obtain that
.
In the next section we will introduce an example of the least squares estimation criterion.
We can obtain a graphical representation of the least squares ordinary estimation by using the following Quantlet
|
The regression line computed by the least squares method using the data generated in (1.49)
is shown in Figure 1.7 jointly with the data set.
Once the regression line is estimated, it is useful to know how
well the regression line approximates the data from the sample. A
measure that can describe the quality of representation is called
the coefficient of determination (either R-Squared or ). Its
computation is based on a decomposition of the variance of the
values of the dependent variable
.
The smaller is the sum of squared estimated residuals, the better is the quality of the regression line. Since the Least Squares method minimizes the variance of the estimated residuals it also maximizes the R-squared by construction.
The sample variance of the values of is:
The element
is known as Total Sum of Squares (TSS), it is the total variation of the
values of
from
. The deviation of the observed values,
, from the arithmetic mean,
, can be decomposed into
two parts: The deviation of the observed values of
from the
estimated regression values and the deviation of the estimated
regression values from the sample mean. i. e.
where
is the error term in this estimate.
Note also that considering the properties of the OLS estimators it
can be proved that
. Taking the square of
the residulas and summing over all the observations, we obtain the
Residual Sum of Squares,
. As a
goodness of fit criterion the RSS is not satisfactory because the
standard errors are very sensitive to the unit in which
is
measured. In order to propose a criteria that is not sensitive to
the measurement units, let us decompose the sum of the squared
deviations of equation (1.43) as
Now, noting that by the properties of the OLS estimators we have
that
, expression
(1.44) can be written as
where
, is the so
called Explained Sum of Squares. Now, dividing both sides of
equation (1.45) by
, we obtain
![]() |
![]() |
![]() |
|
(1.46) | |||
![]() |
![]() |
and then,
The total variance of is equal to the sum of the sample
variance of the estimated residuals (the unexplained part of the
sampling variance of
) and the part of the sampling variance of
that is explained by the regression function (the sampling
variance of the regression function).
The larger the portion of the sampling variance of the values of
is explained by the model, the better is the fit of the
regression function.
The Coefficient of Determination
The coefficient of the determination is defined as the ratio
between the sampling variance of the values of explained by
the regression function and the sampling variance of values of
. That is, it represents the proportion of the sampling
variance in the values of
"explained" by the estimated
regression function.
This expression is unit-free because both the numerator and denominator have the same units. The higher the coefficient of determination is, the better the regression function explains the observed values. Other expressions for the coefficient are
One special feature of this coefficient is that the R-Squared can
take values in the following range:
. This is
always true if the model includes a constant term in the
population regression function. A small value of
implies
that a lot of the variation in the values of
has not been
explained by the variation of the values of
.
Ordinary Least Squares estimates of the parameters of interest are given by executing the following quantlet
|
As an example, we use the original data source that was already shown in Figure 1.4
Once the econometric model has been both specified and estimated, we are now interested in analyzing the relationship between the estimators (sample) and their respective parameter values (population). This relationship is going to be of great interest when trying to extend propositions based on econometric models that have been estimated with a unique sample to the whole population. One way to do so, is to obtain the sampling distribution of the different estimators. A sampling distribution describes the behavior of the estimators in repeated applications of the estimating formulae. A given sample yields a specific numerical estimate. Another sample from the same population will yield another numerical estimate. A sampling distribution describes the results that will be obtained for the estimators over the potentially infinite set of samples that may be drawn from the population.
Properties of
and
We start by computing the finite sample distribution of the
parameter vector
. In
order to do so, note that taking the expression for
in (1.36) and
in (1.37) we can write
To fully characterize the whole sampling distribution we need to determine both the mean vector, and the variance-covariance matrix of the OLS estimators. Assumptions (A.1), (A.2) and (A.3) immediately imply that
Now, assumptions (A.1), (A.5) and (A.6) allow us to simplify expression (1.56) and we obtain
Finally, substitute by its definition in equation
(1.50) and we will obtain the following expressions for the
variance covariance matrix
We can say that the OLS method produces BLUE (Best Linear Unbiased Estimator) in the following sense: the OLS estimators are the linear, unbiased estimators which satisfy the Gauss-Markov Theorem. We now give the simplest version of the Gauss-Markov Theorem, that is proved in Johnston and Dinardo (1997), p. 36.
Gauss-Markov Theorem: Consider the regression model
(1.22). Under assumptions (A.1) to (A.6) the OLS estimators
of and
are those who have minimum variance among
the set of all linear and unbiased estimators of the parameters.
We remark that for the Gauss-Markov theorem to hold we do not need to include assumption (A.7) on the distribution of the error term. Furthermore, the properties of the OLS estimators mentioned above are established for finite samples. That is, the estimator divergence between the estimator and the parameter value is analyzed for a fixed sample size. Other properties of the estimators that are also of interest are the asymptotic properties. In this case, the behavior of the estimators with respect to their true parameter values are analyzed as the sample size increases. Among the asymptotic properties of the estimators we will study the so called consistency property.
We will say that the OLS estimators,
,
,
are consistent if they converge weakly in probability (see
Serfling (1984) for a definition) to their respective
parameter values,
and
. For weak convergence in
probability, a sufficient condition is
Properties of
For the statistical properties of
, we will just
enumerate the different statistical results that will be proved in
a more general setting in Chapter 2, Section 2.4.2. of this
monograph.
Under assumptions (A.1) to (A.7), the finite sample distribution of this estimator is given by
Then, by the properties of the distribution it is easy to
show that
This result allows us to calculate the variance of as
Note that to calculate this variance, the normality assumption,
(A.7), plays a crucial role. In fact, by assuming that
, then
, and the fourth order moment is
already known an related to
. These two properties are
of great help to simplify the third and fourth order terms in
equation (1.62).
Under assumptions (A.1) to (A.7) in Section 1.2 it is possible to show (see Chapter 2, Section 2.4.2 for a proof)
From the last result, note finally that although
is
not efficient for finite sample sizes, this estimator achieves
asymptotically the Cramer-Rao lower bound.
To illustrate the different statistical properties given in the
previous section, we develop three different simulations. The
first Monte Carlo experiment analyzes the finite sample
distribution of both
,
and
.
The second study performs a simulation to explain consistency, and
finally the third study compares finite sample and asymptotic
distribution of the OLS estimator of
.
Example 1
The following program illustrates the statistical properties of
the OLS estimators of and
. We implement the
following Monte Carlo experiment. We have generated 500
replications of sample size n = 20 of the model
. The values of
have been
generated according to a uniform distribution,
, and
the the values for the error term have been generated following a
normal distribution with zero mean and variance one,
. To fulfil assumption (A.1), the values of
are fixed
for the
different replications. For each sample
(replication) we have estimated the parameters
and
and their respective variances (note that
has
been replaced by
). With the 500 values of the
estimators of these parameters, we generate four different
histograms
The result of this procedure is presented in the Figure
1.8. With a sample size of , the histograms
that contain the estimations of
and
in
the different replications approximate a gaussian distribution. In
the other hand, the histograms for the variance estimates
approximate a
distribution, as expected.
Example 2
This program analyzes by simulation the asymptotic behavior of
both
and
when the sample size increases.
We generate observations using the model,
,
, and
. For
different sample
sizes, (
), we have generated 50 replications for
each sample size. For each sample size we estimate 50 estimators
of
,
, then, we calculate
and
conditioning on the sample size.
The code gives the output presented in Figure 1.9. As
expected, when we increase the sample size
tends to
, in this case
, and
tends to
.
Example 3
In the model
,
, and
. We implement the following Monte Carlo experiment. For
two different sample sizes we have generated 500 replications for
each sample size. The first 500 replications have a sample size n
= 10, the second n = 1000. In both sample sizes we estimate 500
estimators of
. Then, we calculate two histograms for
the estimates of
, one for
, the other for
.
The output of the code is presented in Figure 1.10. As
expected, the histogram for approximates a
density, whereas for
, the approximated density is the
standard normal.