4. Regression Methods

Jörg Aßmus
28 July 2004

Simply speaking, a regression problem is a way to determine a function $ \widehat m(\bullet)$ describing a functional relation $ m(\bullet)$ between a $ p$-dimensional variable $ X=(X_1,\ldots,X_p)$ and an output variable $ Y$

$\displaystyle Y=m(X)+\varepsilon\,,$

where $ \varepsilon$ is a random error term.

There are two different approaches of fitting the function $ m$. In the first approach, we define $ m$ by a finite-dimensional parameter $ \beta=(\beta_0,\ldots,\beta_p)$:

$\displaystyle Y=m_{\beta}(X)+\varepsilon$

where it is sufficient to estimate the parameter $ \beta$. In this case, we are able to approximate the function at each point $ x=(x_1,\ldots,x_p)$ using only the parameter estimate $ \widehat\beta$ of $ \beta$ by

$\displaystyle \widehat Y(x)=m_{\widehat\beta}(x)\,.$

This is called a parametric model. A very useful example is the linear regression
$\displaystyle Y$ $\displaystyle =$ $\displaystyle \beta_0+\beta_1 X_1+\ldots+\beta_p X_p+\varepsilon,$  
$\displaystyle \widehat Y(x)$ $\displaystyle =$ $\displaystyle \widehat\beta_0 +\widehat\beta_1 x_1+\ldots+\widehat\beta_p x_p\,.$  

On the other hand a model is called nonparametric if the dimension $ p$ of the parameter $ \beta$ is infinite. This means that we do not know anything about the function $ m$. In this case we have to use the data set for the calculation of the estimated function $ \widehat m$ at each point $ x$. We investigate this question in Smoothing Methods (6).

Both models can be estimated with XploRe . A simple way of choosing the appropriate method is given here:

The nonlinear methods are more general than the linear methods. This means that we can use the nonlinear regression for estimating linear models, but this cannot be recommended in general. In the same way, the nonparametric models are more general than the parametric ones.

In the following sections, the libraries stats containing the quantlets for the regression and graphic containing the plot quantlets will be used. We should load them before we continue:

  library("graphic")          ; reads the library graphic
  library("stats")            ; reads the library stats