Keywords - Function groups - @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Library: metrics
See also: rrstest

Quantlet: rqfit
Description: Performs quantile regression of y on x using the original simplex approach of Barrodale-Roberts/Koenker-d'Orey.

Usage: z = rqfit(x,y{,tau,ci,alpha,iid,interp,tcrit})
Input:
x n x p design matrix of explanatory variables.
y n x 1 vector, dependent variable.
tau desired quantile, default value = 0.5. If tau is inside <0,1>, a single quantile solution is computed and returned. If tau is outside of <0,1>, solutions for all quantiles are sought and the program computes the whole quantile regression solution as a process in tau. The resulting arrays containing the primal and dual solutions and betahat(tau) are called sol and dsol. It should be emphasized that this form of the solution can be both memory and cpu quite intensive. On typical machines it is not recommended for problems with n > 10,000.
ci logical flag for confidence intervals (nonzero = TRUE), default value = 0. If ci = 0, only the estimated coefficients are returned. If ci != 0, confidence intervals for the parameters are computed using the rank inversion method of Koenker (1994). Note that for large problems the option ci != 0 can be rather slow. Note also that rank inversion only works for p > 1, an error message is printed in the case that ci != 0 and p = 1.
alpha the nominal coverage probability for the confidence intervals, i.e., aplha/2 gives the level of significance for confidence intervals, default value = 0.1.
iid logical flag for iid errors (nonzero = TRUE), default value = 1. If iid != 0, then the rank inversion (see parameter ci) is based on an assumption of iid error model and the original version of the rank inversion intervals is used (as in Koenker, 1994). If iid = 0, then it is based on the heterogeneity error assumption. See Koenker and Machado (1999) for further details.
interp logical flag for smoothed confidence intervals (nonzero = TRUE), default value = 1. As with typical order statistic type confidence intervals the test statistic is discrete, so it is reasonable to consider intervals that interpolate between values of the parameter just below the specified cutoff and values just above the specified cutoff. If interp != 0, this function returns a single interval based on linear interpolation of the two intervals. If interp = 0, then the 2 "exact" values above and below on which the interpolation would be based are returned. Moreover, in this case c.values and p.values which give the critical values and p.values of the upper and lower intervals are returned.
tcrit logical flag for finite sample adjustment using t-statistics (nonzero = TRUE), default value = 1. If tcrit != 0, Student t critical values are used, while for tcrit = 0 normal ones are employed.
Output:
z.coefs p x 1 or p x m matrix. If tau is in <0,1>, the only column (p x 1) contains estimated coefficients. If tau is outside <0,1>, then p x m matrix contains estimated coefficients for all quantiles = sol[4:(p+3),], see sol description.
z.intervals nothing, p x 2, or p x 4 matrix containing confidence intervals. If ci = 0, then no confidence intervals are computed. If ci != 0 and interp != 0, then variable intervals has 2 columns, interpolated "lower bound" and "upper bound". If ci != 0 and interp = 0, then variable intervals contains "lower bound", "Lower Bound", "upper bound", "Upper Bound". See description of ci and interp parameters for further information.
z.res n x 1 vector of residuals. Not supplied if tau is not inside <0,1>.
z.sol The primal solution array. This is a (p+3) by J matrix whose first row contains the 'breakpoints' tau_1,tau_2,...tau_J, of the quantile function, i.e. the values in [0,1] at which the solution changes, row two contains the corresponding quantiles evaluated at the mean design point, i.e. the inner product of xbar and b(tau_i), the third row contains the value of the objective function evaluated at the corresponding tau_j, and the last p rows of the matrix give b(tau_i). The solution b(tau_i) prevails from tau_i to tau_i+1. Portnoy (1991) shows that J=O_p(n log n).
z.dsol The dual solution array. This is an by J matrix containing the dual solution corresponding to sol, the ij-th entry is 1 if y_i > x_i b(tau_j), is 0 if y_i < x_i b(tau_j), and is between 0 and 1 otherwise, i.e. if the residual is zero. See Gutenbrunner and Jureckova(1991) for a detailed discussion of the statistical interpretation of dsol. The use of dsol in inference is described in Gutenbrunner, Jureckova, Koenker, and Portnoy (1994).
z.cval c-values, see the description of interp parameter for further information. Not supplied if tau is not in <0,1> or ci == 0.
z.pval p-values, see the description of interp parameter for further information. Not supplied if tau is not in <0,1> or ci == 0.

Example:
library("metrics")
;
; simulate data
;
randomize(101)
x = uniform(100,3)
y = x[,1] + 2*x[,2] - x[,3] + normal(100)
;
; fit the data ... median regression
;
z = rqfit(x,y)
z.coefs

Result:
Contents of coefs - estimates of b = (1,2,-1)' coefficient vector
[1,]   0.8774
[2,]   2.0738
[3,]  -1.3159
Example:
; procedure for graphical representation of the results estimated by rqfit quantlet
; parameter obs1 is an outlier added to randomly generated data points
proc() = myquant(obs1)
  ;
  ; initialization
  ;
  n = 10                   ; number of observations
  randomize(17654321)      ; sets random seed
  beta = #(1, 2)           ; defines intercept and slope
  x = matrix(n)~sort(uniform(n))
  ;                        ; creates design matrix
  ;
  ; new x-observation is added
  ;
  x = x|(1~obs1[1])
  m = x*beta               ; defines regression line
  eps = 0.05*normal(n)     ; creates obs error
  ;
  ; new y-observation is added
  ;
  y = m[1:n] + eps         ; noisy line
  y = y|obs1[2]
  ;
  ; create graphical display and draw data points
  ;
  d = createdisplay(1,1)
  dat = x[,2]~y
  outl = obs1[1]~obs1[2]
  setmaskp(outl,1,12,15)    ; outlier is blue big star
  tdat = x[,2]~m
  setmaskl(tdat,(1:rows(tdat))', 1, 1, 1)
  ;                        ; thin blue line
  setmaskp(tdat, 0, 0, 0)  ; reduces point size to min
  ;
  ; estimation of the model using rqfit
  ;
  z = rqfit(x,y,0.5)
  beta1 = z.coefs
  ;
  ; draw estimated regression line
  ;
  yhat = x*beta1
  hdat = x[,2]~yhat
  setmaskp(hdat, 0, 0, 0)
  setmaskl(hdat,(1:rows(hdat))', 4, 1, 3)
  ;                        ; thick red line
  show(d, 1, 1, dat[1:n], outl, tdat, hdat)
  title="Quantile regression with outlier"
  setgopt(d,1,1,"title",title)
  ;                        ; sets title
endp                         ; end of myquant
;
; load metrics library
;
library("metrics")
;
; call estimation function with outlier #(0.9,4.5)
;
myquant(#(0.9,4.5))

Result:
As a result, you should see a graph, in which observations are
denoted by black circles and an outlier (observation #(0.9,4.5))
is represented by the blue big star in the right upper corner of
the graph.
The blue line depicts the true regression line (beta = #(1,2)),
while the thick red line shows the estimated regression line.



Author: P. Cizek, 19990920 license MD*Tech
(C) MD*TECH Method and Data Technologies, 05.02.2006