1.4 Inference for QR

In this section we will discuss possibilities regarding statistical inference for quantile regression models. Although there are nearly no usable finite sample results for statistical inference compared to the often used theory of least squares under the normality assumption, the asymptotic theory offers several competing methods, namely tests based on the Wald statistics, rank tests, and likelihood ratio-like tests. Some of them are discussed in this section.


1.4.1 Main Asymptotic Results

The asymptotic behavior of ordinary sample quantiles generalizes relatively easily to the quantile regression case. A fundamental result was derived in Koenker and Bassett (1978). Let $ \{ \hat{\beta}(\tau) \vert
\tau \in (0,1) \}$ be the quantile regression process and consider the classical regression model

$\displaystyle y_i = X_i \beta + \varepsilon _i, ~~~ i \in \{{1},\ldots,{n}\}
$

with i.i.d. errors $ \varepsilon _i \sim F$, $ F$ has a density $ f$ such that for all $ x \in \mathbb{R}, 0 < F(x) < 1,$ holds $ f(x) > 0$. Assuming that $ n^{-1} \sum_{i = 1}^{n} X_i^T X_i \equiv Q_n \to Q_0$ as $ n \to +\infty,$ where $ Q_0$ is a positive definite matrix, Koenker and Bassett (1978) showed that the joint asymptotic distribution of $ m$ $ p$-variate quantile regression estimators $ \hat{B}_n =
\bigl( \hat{\beta}_n(\tau_1)^T,\ldots,\hat{\beta}_n(\tau_m)^T \bigr)^T$ takes the form

$\displaystyle \sqrt{n}\left(\hat{B}_n - B\right) = \left[\sqrt{n}\left\{\hat{\b...
...ht\}\right]_{j=1}^m \stackrel{F}{\to} N\left(0, \Omega \otimes Q_0^{-1}\right),$ (1.11)

where $ \Omega = (\omega_{ij})_{i,j=1}^m = \left[(\min\{\tau_i,\tau_j\}-\tau_i\tau_j)/
(f\{F^{-1}(\tau_i)\} f\{F^{-1}(\tau_j)\}\right]_{i,j=1}^m$ and $ F^{-1}$ refers again to the quantile function corresponding to the distribution $ F$. Having the asymptotic normality of the estimates in hand, one can use the Wald test for testing hypotheses that can be expressed as a set of linear restrictions on $ \hat{\beta}_n(\tau_1),\ldots,\hat{\beta}_n(\tau_m)$ for some $ {\tau}_{1},\ldots,{\tau}_{m}, m \in \mathbb{N}$.

The situation is little bit more complicated in the case of non-i.i.d. errors, but the normality of the quantile regression estimator is preserved under heteroscedasticity. If we denote the estimate of coefficients for $ \tau \in (0,1)$ by $ \hat{\beta}_n(\tau)$, then for $ n \to +\infty$

$\displaystyle \sqrt{n} \left\{\hat{\beta}_n(\tau) - \beta(\tau)\right\} \stackrel{F}{\to} N\left(0, H^{-1}(\tau) J(\tau) H^{-1}(\tau)\right),$ (1.12)

where

$\displaystyle J(\tau) = \lim_{n\to+\infty} J_n(\tau) = \lim_{n\to+\infty} \tau(1-\tau) n^{-1} \sum_{i = 1}^{n} X_i^T X_i
$

and

$\displaystyle H(\tau) = \lim_{n\to+\infty} H_n(\tau) = \lim_{n\to+\infty} n^{-1} \sum_{i = 1}^{n} X_i^T X_i
f_i\{F^{-1}(\tau)\}
$

( $ f_i(\cdot)$ denotes conditional density of $ y$ given $ X$). A guide to the estimation of $ H_n(\tau)$ is given, for example, in Powell (1989), and Koenker and Portnoy (2000).


1.4.2 Wald Test

As was already mentioned in the previous section, the asymptotic normality of quantile regression estimates gives us the possibility to test various linear hypotheses formulated through regression quantiles by means of the Wald test. For a general linear hypothesis about the vector $ B = \bigl(\beta(\tau_1)^T,\ldots,\beta(\tau_m)^T\bigr)^T$

$\displaystyle H_0: H B = h$ (1.13)

($ H$ being a $ J \times mp$ matrix, $ h$ a vector of length $ J$), the Wald test statistics can be written as

$\displaystyle W_n = (H\hat{B}-h)^T \left[ H \left\{\Omega \otimes (X^T X)^{-1}\right\} H^T \right]^{-1} (H\hat{B}-h),$ (1.14)

which has under the validity of $ H_0$ asymptotically $ \chi^2$ distribution with $ J$ degrees of freedom. The only difficult point is the estimation of the asymptotic covariance matrix $ \Omega$. There are several strategies available, see Koenker and Portnoy (2000) for their discussion.

To present a possible application of this test procedure, let us explain a simple test of heteroscedasticity. Following Koenker and Bassett (1982a), homoscedasticity is equivalent to the equality of slope parameters across quantiles. Consider, for example, model (1.3)

$\displaystyle y_i = \alpha + \delta D_i + \varepsilon _i, ~~~ i \in \{{1},\ldots,{n}\}.
$

Then the test of the equality of the slope parameter $ \delta$ across quantiles $ \tau_1, \tau_2$ is nothing but the test of the hypothesis

$\displaystyle H_0: \delta(\tau_2) - \delta(\tau_1) = 0.
$

Since the $ \tau$-th quantile regression estimate $ \hat{\delta}(\tau)$ is in this case simply the difference of the $ \tau$-th quantiles for samples $ \{y_i \vert D_i = 0\}$ and $ \{y_i \vert D_i = 1\}$ (remember, $ D_i \in \{0,1\}$),
$\displaystyle {\hat{\delta}(\tau_2) - \hat{\delta}(\tau_1) =} $
  $\displaystyle =$ $\displaystyle \left\{Q_{Y}(\tau_2\vert D_i=1) - Q_{Y}(\tau_2\vert D_i=0)\right\} - \left\{Q_{Y}(\tau_1\vert D_i=1) - Q_{Y}(\tau_1\vert D_i=0)\right\}$  
  $\displaystyle =$ $\displaystyle \left\{Q_{Y}(\tau_2\vert D_i=1) - Q_{Y}(\tau_1\vert D_i=1)\right\} - \left\{Q_{Y}(\tau_2\vert D_i=0) - Q_{Y}(\tau_1\vert D_i=0)\right\}$  

Further it is possible to derive the variance $ \sigma^2(\tau_1,\tau_2)$ of $ \hat{\delta}(\tau_2) - \hat{\delta}(\tau_1)$ from formula (1.11), and to construct the statistics

$\displaystyle T_n = \{\hat{\delta}(\tau_2) - \hat{\delta}(\tau_1)\} / \hat{\sigma}(\tau_1,\tau_2)
$

with an asymptotically normal distribution. For a practical realization of the test, it would be necessary to approximate $ \hat{\sigma}(\tau_1,\tau_2)$, but this goes beyond the scope of this tutorial. For more information on the so-called sparsity estimation, see for example Siddiqui (1960), Bofinger (1975), Welsh (1988), or Hall and Sheather (1988).


1.4.3 Rank Tests


z = 3211 rqfit (x, y{, tau, ci, alpha, iid, interp, tcrit})
estimates noninteractively a given quantile regression model
chi = 3214 rrstest (x0, x1, y{, score})
performs the regression rankscore test

The classical theory of rank test (Hájek and Šidák; 1967) employs the rankscore functions

$\displaystyle a_{ni}(t) = \left\{ \begin{array}{ll} 1 & \textrm{if}~ t \leq (R_...
...R_i - 1)/n < t \leq R_i/n, \\ 0 & \textrm{if}~ R_i/n < t \\ \end{array} \right.$ (1.15)

where $ R_i$ represents the rank of the $ i$th observation $ y_i$ in $ ({y}_{1},\ldots,{y}_{n})$. The integration of $ a_{ni}(t)$ with respect to various score generating functions $ \psi$ produces vectors of scores, rank-like statistics, which are suitable for testing. For instance, integrating $ a_{ni}(t)$ using the Lebesgue measure (generating function $ \psi(t) = t$) generates the Wilcoxon scores

$\displaystyle s_i = \int_0^1 a_{ni}(t) dt = \frac{R_i-1/2}{n}, ~~~ i = {1},\ldots,{n},$ (1.16)

$ \psi(t) = \mathop{\rm sgn}\nolimits (t-1/2)$ yields the sign scores $ s_i = a_{ni}(1/2)$ and so on. The way how this theory can be applied in the regression context was found by Gutenbrunner and Jurecková (1992), who noticed that the rankscores (1.15) may be viewed as a solution of a linear-programming model
  $\displaystyle \max\limits_{a \in \langle 0,1 \rangle^n}$ $\displaystyle y^T a$ (1.17)
  $\displaystyle \textrm{subject to}$ $\displaystyle Xa = (1-t) X 1_n.$  

The important property of this model is its duality to model (1.9) that defines regression quantiles--see also Koenker and D'Orey (1993) for details.

The uncovered link to rankscore tests enabled to construct tests of significance of regressors in quantile regression without necessity to estimate some nuisance parameters (such as $ \Omega$ in the case of the Wald test). Given the model $ y = X_0\beta_0 + X_1\beta_1 + \varepsilon , \beta_0 \in \mathbb{R}^{p-J}, \beta_1 \in \mathbb{R}^J$, Gutenbrunner, Jurecková, Koenker, and Portnoy (1993) designed a test of hypothesis $ H_0: \beta_1 = 0$ based on the regression rankscore process. It is constructed in the following way: first, compute $ \{a_{ni}(t)\}_{i=1}^n$ at the restricted model $ y = X_0\beta_0 + \varepsilon $ and the corresponding rankscores vector $ s = (s_i)_{i=1}^n = \left\{ - \int a_{ni}(t) d\psi(t) \right\}_{i=1}^n$. Next, form the vector

$\displaystyle S_n = n^{-1/2} X_1 s,
$

which converges in distribution to $ N\left(0,A^2(\psi)Q_0\right)$ under the null hypothesis, where $ A^2(\psi) = \int_0^1 \psi^2(t)dt$, $ Q_0 = \lim_{n\to\infty} (X_1 - \hat{X}_1)^T (X_1 - \hat{X}_1)/n$, and $ \hat{X}_1 = X_0 (X_0^T X_0)^{-1} X_0^T X_1$. Finally, the test statistics

$\displaystyle T_n = S_n^T Q^{-1} S_n / A^2(\psi)$ (1.18)

has asymptotically $ \chi_J^2$ distribution. To do this test in XploRe (given some (x,y), the only thing you have to do is to split the matrix x to two parts and to call the quantlet 3234 rrstest , which requires the two parts of the matrix x and the vector y on input. Optionally, you can specify the type of the score generating function to be used (see Section 1.5 for more details); the Wilcoxon scores are employed by default. For demonstration of the quantlet, let us use again a simulated data set--we generate a pseudo-random $ 100 \times 3$ matrix x ($ X$) and two response vectors y1 and y2 corresponding to models $ y_1 = X \cdot (1,2,-1)^T + \varepsilon _1$ and $ y_2 = X \cdot
(1,2,0)^T + \varepsilon _2$, respectively. We test the hypothesis that the coefficient of $ X_3$ is zero. The resulting test statistics differ significantly (in the first case, the test statistics is significant, while in the latter one not) as is documented both by their values and $ p$-values.
; simulate data matrix
  n = 100
  randomize(1101)
  x = matrix(n) ~ uniform(n,2)

; generate y1 and y2
  y1 = x[,1] + 2*x[,2] - x[,3] + normal(n)
  y2 = x[,1] + 2*x[,2] + normal(n)

; test the hypothesis that the coefficient of x[,3] is zero
; first case
  chi1 = rrstest(x[,1:2], x[,3], y1)
  chi1
  cdfc(chi1,1)

; second case
  chi2 = rrstest(x[,1:2], x[,3], y2)
  chi2
  cdfc(chi2,1)
3238 XAGqr06.xpl

When the script ends, the XploRe output should look like
  Contents of chi1
  [1,]   19.373
  Contents of cdfc
  [1,]  0.99999

  Contents of chi2
  [1,]  0.018436
  Contents of cdfc
  [1,]  0.10801

The existence of a testing strategy for quantile regression motivated the search for a reverse procedure that would provide a method for estimating confidence intervals without actual knowledge of the asymptotic covariance matrix. Quite general results in this area were derived in Hušková (1994). Although the computation of these confidence intervals is rather difficult, there are some special cases for which the procedure is tractable (Koenker; 1994). An adaptation of the technique for non-i.i.d. errors have been done recently. Now, it was already mentioned that quantlet 3247 rqfit can compute also confidence intervals for quantile regression estimates. This is done by the above mentioned method of inverting rank tests, which has several practical implications. Above all, the computation of confidence intervals at an exact significance level $ \alpha$ would require knowledge of the entire quantile regression process $ \{ \hat{\beta}(\tau) \vert
\tau \in (0,1) \}$. This is not possible because we always work with finite samples, hence we have only an approximation of the process in the form $ \{\hat{\beta}(\tau) \vert \tau \in \{{\tau}_{1},\ldots,{\tau}_{m}\} \}$. Therefore, two confidence intervals are computed for every parameter at a given significance level $ \alpha$ (parameter alpha)--the largest one with true significance level higher than $ \alpha$, let us call it $ I_1$, and the smallest one with true significance level lower than $ \alpha$, $ I_2$. Then, according to the value of parameter interp, various results are returned. If its value is nonzero or the parameter is not specified, e.g.,

  z = rqfit(x, y, 0.5, 1)
then the bounds of the returned intervals are interpolated from the lower and upper bounds of the pairs of intervals, and the result in z.intervals is a $ p \times 2$ matrix of confidence intervals--the first column holds the interpolated lower bounds, the second one upper bounds. In the other case, i.e., interp $ = 0$,
  z = rqfit(x, y, 0.5, 1, 1, 0)
z.intervals is a $ p \times 4$ matrix of pairs of confidence intervals--the first column contains the lower bounds of intervals $ I_2$, the second one lower bounds of $ I_1$'s, the third one embodies upper bounds of $ I_1$'s, and the fourth one upper bounds of intervals $ I_2$, which implies that the bounds in rows of z.intervals are numerically sorted. In this case, the matrix z.pval will contain the correct $ p$-values corresponding to bounds in z.intervals.

Finally, before closing this topic, we make one small remark on iid switch. Its value specifies, whether the procedure should presume i.i.d. errors (this is the default setting), or whether it should make some non-i.i.d. errors adjustments. We can disclose the effect of this parameter using the already discussed nicfoo data. The data seem to exhibit some kind of heteroscedasticity (as is often the case if the set of significant explanatory variables involve individuals with diverse levels of income), see Figure 1.4.

Figure: Least squares (the black thick line) and quantile regression for $ \tau = 0.1, 0.3, 0.5, 0.7, 0.9$ (blue, green, and red lines) 3259 XAGqr07.xpl
\includegraphics[scale=0.6]{qr07}

To compare the resulting confidence intervals for median regression under the i.i.d. errors assumption and without it, you can type at the command line or in the editor window

  data = read("nicfoo")
  x = matrix(rows(data)) ~ data[,1] ~ (data[,1]^2)
  y = data[,2]
;
  z = rqfit(x,y,0.5,1,0.1,1)
  z.intervals
;
  z = rqfit(x,y,0.5,1,0.1,0)
  z.intervals
3263 XAGqr08.xpl

Once you run this example, the output window will contain the following results:

  Contents of intervals
  [1,]  0.12712  0.13194
  [2,]   1.1667   1.2362
  [3,] -0.24616 -0.24608

  Contents of intervals
  [1,]  0.024142  0.20241
  [2,]   1.0747   1.3177
  [3,] -0.29817  -0.2014
Please, notice the difference between the first group of intervals (i.i.d. errors assumption) and the second one.