1.5 Description of Quantlets for QR

The complete description of XploRe quantlets for quantile regression and the related test follows in next two subsections. There are also several final remarks and notes that are important for the use of these quantlets. In both subsections holds that all the input parameters are discussed first, the output values are commented later.


1.5.1 Quantlet 3415 rqfit


z = 3573 rqfit (x, y{, tau, ci, alpha, iid, interp, tcrit})
estimates noninteractively a quantile regression model

The main purpose of the quantlet is to estimate the quantile regression model given by regression observations (y,x) for a quantile tau. For the sake of simplicity, we will assume throughout this section that the output of 3576 rqfit is stored in a list called z as shown in the template.

x
An $ n \times p$ matrix of explanatory variables. It should not contain missing (NaN) or infinite values (Inf,-Inf). See also Subsection 1.2.2.

y
An $ n \times 1 $ vector of observations for the dependent variable. It should not contain missing (NaN) or infinite values (Inf,-Inf). See also Subsection 1.2.2.

tau
A regression quantile to be estimated. If the parameter is omitted, the predefined value $ 0.5$ is used. There are two different modes of operation, depending on the value of this parameter:

tau inside $ \langle 0,1 \rangle$:
A single quantile solution for the given tau is computed and returned. The estimated parameters are stored in z.coefs and the corresponding residuals are accessible via z.res.

tau outside $ \langle 0,1 \rangle$:
Solutions for all possible quantiles are sought and the approximation of the quantile regression process $ \{\hat{\beta}(\tau) \vert \tau \in \{{\tau}_{1},\ldots,{\tau}_{m}\} \}$ is computed. In this case, z.coefs is a matrix containing $ \hat{\beta}(\tau_1),\ldots,\hat{\beta}(\tau_m)$. The array containing both $ {\tau}_{1},\ldots,{\tau}_{m}$ and $ \hat{\beta}(\tau_1),\ldots,\hat{\beta}(\tau_m)$ is to be found in z.sol.
It should be emphasized that this regime can be quite memory and CPU intensive. On typical machines it is not recommended for problems with $ n > 10000$.

ci
logical flag for confidence intervals (nonzero values mean true) with the default value equal to 0 (false). If ci is zero, only regression coefficients and the corresponding residuals are calculated. In the other cases, confidence intervals for the parameters are computed using the rank inversion method of Koenker (1994) and returned in z.intervals.
Be aware that the computation of confidence intervals can be rather slow for large problems. Note also that rank inversion works only for $ p > 1$, but this should not be very restrictive, since you include intercept in the regression in most cases.

alpha
nominal coverage probability for the confidence intervals, which default value is 0.1. The value is called nominal because the confidence intervals are computed from an approximation of the quantile regression process $ \{\hat{\beta}(\tau) \vert \tau \in \{{\tau}_{1},\ldots,{\tau}_{m}\} \}$. Therefore, the ``available'' significance levels are given by the breakpoints $ {\tau}_{1},\ldots,{\tau}_{m}$, and consequently, by the size of the used data set. Given a nominal significance level alpha, some breakpoints are chosen so that they most closely approximate the required coverage probability. Then either two confidence intervals are returned (the best ones with significance levels just above and below alpha), or interpolation takes place. See Subsection 1.4.3 and the description of parameter interp for more details.

iid
logical flag indicating i.i.d. errors (nonzero values mean true), the value used if the parameter is omitted is 1 (true). If iid is nonzero, then the rank inversion method employs the assumption of i.i.d.errors and the original version of the rank inversion intervals is used (Koenker; 1994). In the opposite case, possible heterogeneity of errors is taken into account. See also Subsection 1.4.3.

interp
logical flag for interpolated confidence intervals (again, nonzero values mean true), the default value is 1 (true). As confidence intervals (and any other test statistics) based on order statistics are discrete, it is reasonable to consider intervals that are an interpolation of two intervals with significance levels just below the specified alpha and just above the specified alpha. If interp is nonzero (and, of course, ci is nonzero, otherwise no confidence intervals are computed), 3601 rqfit returns for every parameter a single interval based on linear interpolation of the two intervals. Therefore, z.intervals is a $ p \times 2$ matrix, each row contains a confidence interval for the corresponding parameter in z.coefs. On the other hand, if interp equals to zero, two ``exact'' intervals with significance levels above and below alpha (that two on which the interpolation would be based) are returned. Thus, z.intervals is a $ p \times 4$ matrix, each row contains first the lower bounds, then the upper bounds of confidence intervals, i.e., all four bounds are sorted in ascending order. Moreover, matrices z.cval and z.pval, which contain the critical values and $ p$-values of the upper and lower bounds of intervals, are returned in this case. See also Subsections 1.2.2 and 1.4.3.

tcrit
A logical flag for finite sample adjustment using $ t$-statistics, its default value is 1 (true). In the default case, the Student critical values are used for the computation of confidence intervals, otherwise, normal ones are employed.
It might sometimes happen that confidence intervals for some parameter have a form (-Inf,Inf) or $ (-10^{300},10^{300})$. Setting this parameter to zero, i.e., decreasing the absolute value of critical values, can help you to obtain finite confidence intervals.

Now, the discussion of output values is ahead.

z.coefs
A $ p \times 1$ or $ p \times m$ matrix. If parameter tau is inside interval $ \langle 0,1 \rangle$, the only column of z.coefs contains the estimated coefficients. If tau falls outside $ \langle 0,1 \rangle$, z.coefs is a $ p \times m$ matrix that contains the estimated coefficients for all breakpoints $ {\tau}_{1},\ldots,{\tau}_{m}$. This matrix is actually composed of the last $ p$ rows of z.sol array, see z.sol for more detailed description. See also Subsection 1.2.2.

z.res
An $ n \times 1 $ vector of regression residuals, that is returned only if tau is inside interval $ \langle 0,1 \rangle$. See also Subsection 1.2.2.

z.intervals
A $ p \times 2$ or $ p \times 4$ matrix containing confidence intervals that are computed only if ci is nonzero and tau belongs to interval $ \langle 0,1 \rangle$. In the first case, one interpolated interval per parameter is returned, in the second one, two intervals per parameter are returned (bounds of the intervals are sorted in ascending order). See the description of parameters alpha and interp for more details as well as Subsections 1.2.2 and 1.4.3.

z.cval
A $ p \times 4$ matrix of critical values for (noninterpolated) confidence intervals. It is returned only when tau is inside interval $ \langle 0,1 \rangle$, ci is nonzero, and interp equals zero. See the description of parameter interp for further information.

z.pval
A $ p \times 4$ matrix of $ p$-values (probabilities) for (noninterpolated) confidence intervals. It is returned only when tau falls to interval $ \langle 0,1 \rangle$, ci is nonzero, and interp equals zero. See the description of parameter interp for further information.

z.sol
The primal solution array, which is a $ (p+3) \times m$ matrix. Its first row contains the breakpoints $ {\tau}_{1},\ldots,{\tau}_{m}$ of the quantile function, i.e., the values in $ (0,1)$ at which the solution changes. The second row contains the corresponding quantiles evaluated at the mean design point, i.e., the inner product of $ \overline{X} = (\overline{X_{.,i}})_{i=1}^p$ and $ \hat{\beta}(\tau_i), i = {1},\ldots,{m}$. The third row contains the value of the objective function evaluated at the corresponding $ \tau_i, i = {1},\ldots,{m}$, see (1.7), and the last $ p$ rows of the matrix give $ \hat{\beta}(\tau_1),\ldots,\hat{\beta}(\tau_m)$. The solution $ \hat{\beta}(\tau_i)$ prevails from $ \tau_i$ to $ \tau_{i+1}, i = {1},\ldots,{m}$. Portnoy (1989) showed that $ m =
{\cal O}_p(n \ln n)$. See also Subsection 1.4.3.

z.dsol
The dual solution array, an $ n \times m$ matrix containing the dual solution corresponding to z.sol. The $ ij$-th entry, $ i \in
\{{1},\ldots,{n}\}, j \in \{{1},\ldots,{m}\}$, is equal to $ t$, where
  $\displaystyle t = 1$ $\displaystyle \textrm{if } y_i > x_i^T \hat{\beta}(\tau_j),$  
  $\displaystyle t = 0$ $\displaystyle \textrm{if } y_i < x_i^T \hat{\beta}(\tau_j),$  
  $\displaystyle 0 < t < 1$ $\displaystyle \textrm{otherwise}.$  

See Gutenbrunner and Jurecková (1992) for a detailed discussion of the statistical interpretation of z.dsol. The use of z.dsol in statistical inference is described in Gutenbrunner, Jurecková, Koenker, and Portnoy (1993).


1.5.2 Quantlet 3631 rrstest


chi = 3991 rrstest (x0, x1, y{, score})
executes the regression rankscore test

The main purpose of the quantlet 3994 rrstest is to test significance of some explanatory variables in regression using rankscore tests. For this purpose, the quantlet invokes already described 3997 rqfit with parameter tau equal to $ -1$. Therefore, the note related to this choice of tau applies here. The test is described in Subsection 1.4.3.

x0
An $ n \times (p-J)$ matrix of maintained regressors. If there is an intercept term in the regression, x0 should contain it. The same restrictions as in the case of x and 4000 rqfit applies on x0--it should not contain missing (NaN) or infinite values (Inf,-Inf).

x1
An $ n \times J$ matrix of regressors under test. The explanatory variables placed in x1 are tested for their significance in regression. Again, x1 should not contain missing (NaN) or infinite values (Inf,-Inf).

y
An $ n \times 1 $ vector of observations for the response variable. It should not contain missing (NaN) or infinite values (Inf,-Inf).

score
The desired score function for test. Possible values are:
score $ = 1$:
Wilcoxon scores (this is the default case); they are asymptotically optimal for logistic error model.
score $ = 2$:
Normal scores, which are asymptotically optimal for Gaussian error model.
score $ = 3$:
Sign scores, which are asymptotically optimal for Laplace error model.
score $ \in (0,1)$:
A generalization of sign scores to the quantile given by the value in $ (0,1)$, i.e., scores generated by the function $ \psi(t) = \mathop{\rm sgn}\nolimits (t - \textrm{\texttt{score}})$.
See also Subsection 1.4.3.

Let us discuss now the only output value of the quantlet.

chi
test statistics that is asymptotically distributed according to $ \chi^2$ with $ J$ degrees of freedom. See also (1.18) in Subsection 1.4.3.