1.5 Description of Quantlets for QR
The complete description of
XploRe
quantlets for quantile regression and the
related test follows in next two subsections.
There are also several final remarks
and notes that are important for the use of these quantlets. In both
subsections holds that all the input parameters are discussed first,
the output values are commented later.
1.5.1 Quantlet
rqfit
- z =
rqfit
(x, y{, tau, ci, alpha, iid, interp, tcrit})
- estimates noninteractively a quantile regression model
|
The main purpose of the quantlet is to estimate the quantile regression model
given by regression observations (y,x) for a quantile
tau. For the sake of simplicity, we will assume throughout this section
that the output of
rqfit
is stored in a list called z as shown
in the template.
- x
- An
matrix of explanatory variables.
It should not contain missing (NaN) or infinite values (Inf,-Inf).
See also Subsection 1.2.2.
- y
- An
vector of observations for the dependent variable.
It should not contain missing (NaN) or infinite values (Inf,-Inf).
See also Subsection 1.2.2.
- tau
- A regression quantile to be estimated. If the parameter is omitted,
the predefined value is used. There are two different modes of operation,
depending on the value of this parameter:
- tau inside
:
- A single quantile solution
for the given tau is computed and returned. The estimated parameters are
stored in z.coefs and the corresponding residuals are accessible via
z.res.
- tau outside
:
- Solutions for all possible quantiles
are sought and the approximation of the quantile regression process
is computed. In this case,
z.coefs is a matrix containing
. The array
containing both
and
is
to be found in z.sol.
It should be emphasized that this regime can be quite memory and CPU intensive.
On typical machines it is not recommended for problems with .
- ci
- logical flag for confidence intervals (nonzero values mean true)
with the default value equal to 0 (false). If ci is zero, only
regression coefficients and the corresponding residuals are calculated. In
the other cases, confidence intervals for the parameters are computed using the
rank inversion method of Koenker (1994) and returned in
z.intervals.
Be aware that the computation of confidence intervals can
be rather slow for large problems. Note also that rank inversion works only for
, but this should not be very restrictive, since you
include intercept in the regression in most cases.
- alpha
- nominal coverage probability for the confidence
intervals, which default value is 0.1. The value is called nominal because the
confidence intervals are computed from an approximation of the quantile
regression process
. Therefore, the ``available'' significance levels are
given by the breakpoints
, and consequently, by the size of
the used data set. Given a nominal significance level alpha, some
breakpoints are chosen so that they most closely approximate the required
coverage probability. Then either two confidence intervals are returned (the
best ones with significance levels just above and below alpha), or
interpolation takes place. See Subsection 1.4.3 and the description of
parameter interp for more details.
- iid
- logical flag indicating i.i.d. errors (nonzero values mean true),
the value used if the parameter is omitted is 1 (true).
If iid is nonzero, then the rank inversion method employs
the assumption of i.i.d.errors and the original version of the rank inversion
intervals is used (Koenker; 1994). In the opposite case, possible heterogeneity of
errors is taken into account. See also Subsection 1.4.3.
- interp
- logical flag for interpolated confidence intervals (again,
nonzero values mean true), the default value is 1 (true). As
confidence intervals (and any other test statistics) based on order statistics
are discrete, it is reasonable to consider intervals that are an interpolation
of two intervals with significance levels just below the specified
alpha and just above the specified alpha.
If interp is nonzero (and, of course, ci is nonzero,
otherwise no confidence intervals are computed),
rqfit
returns for
every parameter a single interval based on linear interpolation of the two
intervals. Therefore, z.intervals is a
matrix, each row
contains a confidence interval for the corresponding parameter in
z.coefs. On the other hand, if interp equals to zero, two ``exact''
intervals with significance levels above and below alpha (that two on
which the interpolation would be based) are returned. Thus, z.intervals
is a
matrix, each row contains first the lower bounds, then the
upper bounds of confidence intervals, i.e., all four bounds are sorted in
ascending order. Moreover, matrices z.cval and z.pval, which
contain the critical values and -values of the upper and lower bounds of
intervals, are returned in this case. See also Subsections 1.2.2 and
1.4.3.
- tcrit
- A logical flag for finite sample adjustment using
-statistics, its default value is 1 (true).
In the default case, the Student critical values are used for the computation
of confidence intervals, otherwise, normal ones are employed.
It might sometimes happen that confidence intervals for some parameter have a
form (-Inf,Inf) or
. Setting this parameter to zero,
i.e., decreasing the absolute value of critical values, can help you to obtain
finite confidence intervals.
Now, the discussion of output values is ahead.
- z.coefs
- A
or
matrix.
If parameter tau is inside interval
,
the only column of z.coefs contains the estimated coefficients.
If tau falls outside
, z.coefs is
a
matrix that contains the estimated coefficients for all
breakpoints
. This matrix is actually composed of
the last rows of z.sol array, see z.sol for
more detailed description. See also Subsection 1.2.2.
- z.res
- An
vector of regression residuals, that is returned
only if tau is inside interval
. See also Subsection 1.2.2.
- z.intervals
- A
or
matrix containing confidence
intervals that are computed only if ci is nonzero and tau
belongs to interval
. In the first case, one interpolated
interval per parameter is returned, in the second one, two intervals per
parameter are returned (bounds of the intervals are sorted in ascending order).
See the description of parameters alpha and
interp for more details as well as Subsections 1.2.2
and 1.4.3.
- z.cval
- A
matrix of critical values for (noninterpolated)
confidence intervals. It is returned only when tau is inside interval
, ci is nonzero, and interp equals zero.
See the description of parameter interp for further information.
- z.pval
- A
matrix of -values (probabilities) for (noninterpolated)
confidence intervals. It is returned only when tau falls to interval
, ci is nonzero, and interp equals zero.
See the description of parameter interp for further information.
- z.sol
- The primal solution array, which is a
matrix. Its
first row contains the breakpoints
of the quantile function,
i.e., the values in at which the solution changes. The second row
contains the corresponding quantiles evaluated at the mean design point, i.e.,
the inner product of
and
. The third row contains the value of the
objective function evaluated at the corresponding
, see
(1.7), and the last rows of the matrix give
. The solution
prevails from
to
. Portnoy (1989) showed that
. See also Subsection 1.4.3.
- z.dsol
- The dual solution array, an
matrix containing the
dual solution corresponding to z.sol. The -th entry,
, is equal to , where
See Gutenbrunner and Jurecková (1992) for a detailed discussion of the statistical
interpretation of z.dsol. The use of z.dsol in statistical
inference is described in Gutenbrunner, Jurecková, Koenker, and Portnoy (1993).
1.5.2 Quantlet
rrstest
- chi =
rrstest
(x0, x1, y{, score})
- executes the regression rankscore test
|
The main purpose of the quantlet
rrstest
is to test significance of some
explanatory variables in regression using rankscore tests. For this purpose, the
quantlet invokes already described
rqfit
with parameter tau
equal to . Therefore, the note related to this choice of tau
applies here. The test is described in Subsection 1.4.3.
- x0
- An
matrix of maintained regressors.
If there is an intercept term in the regression, x0 should contain it.
The same restrictions as in the case of x and
rqfit
applies on x0--it should not contain missing (NaN) or
infinite values (Inf,-Inf).
- x1
- An
matrix of regressors under test. The
explanatory variables placed in x1 are tested for their significance
in regression. Again, x1 should not contain missing (NaN) or
infinite values (Inf,-Inf).
- y
- An
vector of observations for the response variable.
It should not contain missing (NaN) or infinite values (Inf,-Inf).
- score
- The desired score function for test. Possible
values are:
- score :
- Wilcoxon scores (this is the default case); they are asymptotically
optimal for logistic error model.
- score :
- Normal scores, which are asymptotically optimal for Gaussian error model.
- score :
- Sign scores, which are asymptotically optimal for Laplace error model.
- score :
- A generalization of sign scores to the quantile given by the value
in , i.e., scores generated by the function
.
See also Subsection 1.4.3.
Let us discuss now the only output value of the quantlet.
- chi
- test statistics that is asymptotically distributed according to with
degrees of freedom. See also (1.18) in Subsection 1.4.3.