2.2 Least Trimmed Squares

In this section the least trimmed squares estimator, its robustness and asymptotic properties, and computational aspects will be discussed.


2.2.1 Definition

First of all, we will precise the verbal description of the estimator given in the previous section. Let us consider a linear regression model for a sample $ (y_i,x_i)$ with a response variable $ y_i$ and a vector of $ p$ explanatory variables $ x_i$:

$\displaystyle y_i =\beta^T x_i + \varepsilon _i, \qquad i={1},\ldots,{n}.
$

The least trimmed squares estimator $ \hat{\beta}^{(LTS)}$ is defined as

$\displaystyle \hat{\beta}^{(LTS)}= \mathop{\rm argmin}\limits _{\beta \in \mathbb{R}^p} \sum_{i = 1}^{h} r_{[i]}^2(\beta),$ (2.2)

where $ r_{[i]}^2(\beta)$ represents the $ i$-th order statistic among $ r_1^2(\beta),\ldots,r_n^2(\beta)$ with $ r_i(\beta) = y_i
- \beta^T x_i$ (we believe that the notation is self-explaining). The so-called trimming constant $ h$ have to satisfy $ \frac{n}{2} < h \leq n$. This constant determines the breakdown point of the LTS estimator since the definition (2.2) implies that $ n-h$ observations with the largest residuals will not affect the estimator (except of the fact that the squared residuals of excluded points have to be larger than the $ h$-th order statistics among the squared residuals). The maximum breakdown point is attained for $ h = [n/2] + [(p+1)/2]$ (see Rousseeuw and Leroy; 1987, Theorem 6), whereas for $ h = n$, which corresponds to the least squares estimator, the breakdown point equals to 0. More on the choice of the trimming constant can be found in Subsection 2.3.1.

Before proceeding to the description of how such an estimate can be evaluated in XploRe , several issues have to be discussed, namely, the existence of this estimator and its statistical properties (a discussion of its computational aspects is postponed to Subsection 2.2.2). First, the existence of the optimum in (2.2) under some reasonable assumptions can be justified in the following way: the minimization of the objective function in (2.2) can be viewed as a process in which we every time choose a subsample of $ h$ observations and find some $ \beta$ minimizing the sum of squared residuals for the selected subsample. Doing this for every subsample (there are $ \left({n} \above 0pt {h}\right)$ of them) we get $ \left({n} \above 0pt {h}\right)$ candidates for the LTS estimate and the one that commands the smallest value of the objective function is the final estimate. Therefore, the existence of the LTS estimator is basically equivalent to the existence of the least squares estimator for subsamples of size $ h$.

Let us now briefly discuss various statistical properties of LTS. First, the least trimmed squares is regression, scale, and affine equivariant (see, for example, Rousseeuw and Leroy; 1987, Lemma 3, Chapter 3). We have also already remarked that the breakdown point of LTS reaches the upper bound $ ([(n-p)/2] + 1)/n$ for regression equivariant estimators if the trimming constant $ h$ equals to $ [n/2] + [(p+1)/2]$. Furthermore, the $ \sqrt{n}$-consistency and asymptotic normality of LTS can be proved for a general linear regression model with continuously distributed disturbances (Víšek; 1999b). Besides these important statistical properties, there are also some less practical aspects. The main one directly follows from the noncontinuity of the LTS objective function. Because of this, the sensitivity of the least trimmed squares estimator to a change of one or several observations might be sometimes rather high (Víšek; 1999a). This property, often referred as high subsample sensitivity, is closely connected with the possibility that a change or omission of some observations may change considerably the subset of a sample that is treated as the set of ``correct'' data points. It does not have to be seen necessarily as disadvantageous, the point of view merely depends on the purpose we are using LTS for. See Víšek (1999b) and Section 2.3 for further information.


2.2.2 Computation


b = 4604 lts (x, y{, h, all, mult})
computes the least trimmed squares estimate of a linear regression model

The quantlet of quantlib metrics which serves for the least trimmed squares estimation is 4609 lts . To understand the function of its parameters, the algorithm used for the evaluation of LTS has to be described. Later, the description of the quantlet follows.

There are two possible strategies how the least trimmed squares estimate can be determined. First one relies on the full search through all subsamples of size $ h$ and the consecutive LS estimation as described in the previous section, and thus, let us obtain the precise solution (neglecting ubiquitarian numerical errors). Unfortunately, it is hardly possible to examine the total of $ \left({n} \above 0pt {h}\right)$ subsamples unless a very small sample is analyzed. Therefore, in most cases (when the number of cases is higher) only an approximation can be computed (note, please, that in the examples presented here we compute the exact LTS estimates as described above, and thus, the computation is relatively slow). The present algorithm does the approximation in the following way: having selected randomly an $ (p+1)$-tuple of observations we apply the least squares method on them, and for the estimated regression coefficients we evaluate residuals for all $ n$ observations. Then $ h$-tuple of data points with the smallest squared residuals is selected and the LS estimation takes place again. This step is repeated so long until a decrease of the sum of the $ h$ smallest squared residuals is obtained. When no further improvement can be found this way, a new subsample of $ h$ observations is randomly generated and the whole process is repeated. The search is stopped either when we find $ s$ times the same estimate of model (where $ s$ is an a priori given positive integer) or when an a priori given number of randomly generated subsamples is accomplished. A more refined version of this algorithm suitable also for large data sets was proposed and described by Rousseeuw and Van Driessen (1999).

From now on, noninteractive quantlet 4614 lts is going to be described. The quantlet expects at least two input parameters: an $ n \times p$ matrix x that contains $ n$ observations for each of $ p$ explanatory variables and an $ n \times 1 $ vector y of $ n$ observed responses. If the intercept is to be included in the regression model, the $ n \times 1 $ vector of ones can be concatenated to the matrix x in the following way:

 
  x = matrix(rows(x))~x
Neither the matrix x, nor the vector y should contain missing (NaN) or infinite values (Inf,-Inf). Their presence can be identified by 4617 isNaN or 4620 isNumber and the invalid observations should be processed before running 4623 lts , e.g., omitted using 4626 paf . These two parameter are enough for the most basic use of the quantlet. Typing
 
  b = lts(x,y)
results in the approximation of the LTS estimate for the most robust choice of $ h = [n/2] + [(p+1)/2]$ using the default number of iterations. Though this might suffice for some purposes, in most cases we would like to specify also the third parameter--the trimming constant $ h$--too. So probably the most common use takes the form
 
  b = lts(x,y,h)
The last two parameters of the quantlet, particularly all and mult, provide a way to influence how the estimate is in fact computed. Parameter all allows to switch from the approximation algorithm, which corresponds to all equal to 0 and is used by default, to the precise computation of LTS, which takes place if all is nonzero. As the precise calculation can take quite a long time if a given sample is not really small, a warning together with a possibility to cancel the evaluation is issued whenever the total number of iterations is too high. Finally, the last parameter mult, which equals to 1 by default, offers possibility to adjust the maximum number of randomly generated subsamples in the case of the approximation algorithm--this maximum is calculated from the size of a given sample and the trimming constant, and subsequently, it is multiplied by mult.

To have a real example, let us show how the time trend in phonecal data set was estimated in Section 2.1. The data set is two-dimensional, having only one explanatory variable x, year, in the first column and the response variable y, the number of international phone calls, in the second column. In order to obtain the LTS estimate for the linear regression of y on constant term and x, you have to type at the command line or in the editor window

 
  z = read("phonecal") 
  x = matrix(rows(z)) ~ z[,2] 
  y = z[,3] 
  b = lts(x,y) 
  b
4632 XAGlts02.xpl

The result of the above example should appear in the XploRe output window as follows:
 
  Contents of coefs 
  [1,]  -5.6522 
  [2,]  0.11649