next up previous contents index
Next: References Up: 9. Robust Statistics Previous: 9.4 Linear Regression

Subsections



9.5 Analysis of Variance


9.5.1 One-way Table

The one-way analysis of variance is concerned with the comparison of the locations of $ k$ samples $ x_{ij}, j=1,\ldots,\,n_i,\, i=1,\ldots,\,k$. The term ''analysis of variance'' goes back to the pioneering work of [39] who decomposed the variance of the combined samples as follows

$\displaystyle \sum_{ij}(x_{ij}-\bar{\boldsymbol{x}})^2=\sum_i\sum_j(x_{ij}- \bar{\boldsymbol{x}}_i)^2+\sum_in_i(\bar{\boldsymbol{x}}_i-\bar{\boldsymbol{x}})^2\,.$ (9.121)

The first term of (9.121) is the total sum of squares, the second is the sum of squares within samples and the third is the sum of squares between samples. If the data are modelled as i.i.d. normal random variables with a common variance $ \sigma ^2$ but with the $ i$th sample mean $ \mu_i$ then it is possible to derive a test for the null hypothesis that the means are equal. The single hypothesis of equal means is rarely of interest in itself. All pairwise comparisons

$\displaystyle \notag \mu_i=\mu_l, \quad 1 \le i < l \le k\,,$    

as well as contrasts $ \sum_i c_i\mu_i=0$ may also be of interest and give rise to the problem of multiple testing and the associated difficulties. The use of the $ L_2$-norm as in (9.121) is widespread perhaps because of the elegant mathematics. The peculiarities of data analysis must however have priority over mathematical theory and as real data sets may contain outliers, be skewed to some extent and have different scales it becomes clear that an $ L_2$-norm and Gaussian based theory is of limited applicability. We sketch a robustified approach to the one-way table (see [28]).

As a first step gross outliers are eliminated from each sample using a simplified version of the outlier identification rule based on the median and MAD of the sample. Using the robust location and scale functionals $ T_l$ and $ T_s$ an $ \alpha_k$ confidence or approximation interval $ I_i$ for location for the $ i$th sample is calculated. To control the error rate for Gaussian and other samples we set $ \alpha_k=\alpha^{1/k}$ with for example $ \alpha=0.95$. This choice guarantees that for Gaussian samples

$\displaystyle P(\mu_i \in I_i, i=1,\ldots,k) = \alpha\,.$ (9.122)

Simulations show that this holds accurately for other symmetric distributions such as the slash, Cauchy and the double exponential. All questions relating to the locations of the samples are now reduced to questions concerning the intervals. For example, the samples $ i$ and $ l$ can be approximated by the same location value if and only if $ I_i\cap I_l \ne \emptyset$. Similarly if the samples are in some order derived from a covariable it may be of interest as to whether the locations can be taken to be non-decreasing. This will be the case if and only if there exist $ a_i, i=1,\ldots,\,k$ with $ a_1
\le a_2 \le \ldots \le a_k$ and $ a_i \in I_i$ for each $ i$. Because of (9.122) all such questions when stated in terms of the $ \mu_i$ can be tested simultaneously and on Gaussian test beds the error rate will be $ 1-\alpha$ regardless of the number of tests. Another advantage of the method is that it allows a graphical representation. Every analysis should include a plot of the boxplots for the $ k$ data sets. This can be augmented by the corresponding plot of the intervals $ I_i$ which will often look like the boxplots but if the sample sizes differ greatly this will influence the lengths of the intervals but not the form of the boxplots.

9.5.2 Two-way Table

Given $ IJ$ samples

$\displaystyle \notag \left(x_{ijk}\right)_{k=1}^{n_{ij}}, \quad i=1,\ldots,\,I,\,j=1,\ldots,\,J$    

the two-way analysis of variance in its simplest version looks for a decomposition of the data of the form

$\displaystyle x_{ijk}=m +a_i+b_j+c_{ij}+r_{ijk}$ (9.123)

with the the following interpretation. The overall effect is represented by $ m$, the row and column effects by the $ a_i$ and $ b_j$ respectively and the interactions by the $ c_{ij}$. The residuals $ r_{ijk}$ take care of the rest. As it stands the decomposition (9.123) is not unique but can be made so by imposing side conditions on the $ a_i,\,b_j$ and the $ c_{ij}$. Typically these are of the form

$\displaystyle \sum_i a_i = \sum_j b_j =\sum_i c_{ij}=\sum_j c_{ij}=0\,,$ (9.124)

where the latter two hold for all $ j$ and $ i$ respectively. The conditions (9.124) are almost always stated as technical conditions required to make the decomposition (9.123) identifiable. The impression is given that they are neutral with respect to any form of data analysis. But this is not the case as demonstrated by [110] and as can be seen by considering the restrictions on the interactions $ c_{ij}$. The minimum number of interactions for which the restrictions hold is four which, in particular, excludes the case of a single interaction in one cell. The restrictions on the row and column effects can also be criticized but we take this no further than mentioning that the restrictions

$\displaystyle \mathrm{MED}(a_1,\ldots,\,a_I) = \mathrm{MED}(b_1,\ldots,\,b_J) =0$ (9.125)

may be more appropriate. The following robustification of the two-way table is based on [106]. The idea is to look for a decomposition which minimizes the number of non-zero interactions. We consider firstly the case of one observation per cell, $ n_{ij}=1,$ for all $ i$ and $ j$, and look for a decomposition

$\displaystyle x_{ij}=m +a_i+b_j+c_{ij}$ (9.126)

with the smallest number of $ c_{ij}$ which are non-zero. We denote the positions of the $ c_{ij}$ by a $ I\times J$-matrix $ C$ with $ C(i,j)=1$ if and only if $ c_{ij}\ne 0$, the remaining entries being zero. It can be shown that for certain matrices $ C$ the non-zero interactions $ c_{ij}$ can be recovered whatever their values and, moreover, they are the unique non-zero residuals of the $ L_1$-minimization problem

$\displaystyle \min_{a_i,b_j}\sum_{ij}\vert x_{ij}-a_i-b_j\vert\,.$ (9.127)

We call matrices $ C$ for which this holds unconditionally identifiable. They can be characterized and two such matrices are

$\displaystyle \begin{pmatrix}1&0&0\\ 0&0&0\\ 0&0&0\end{pmatrix}\quad \begin{pmatrix}1&1&0&0&0\\ 0&1&0&0&0\\ 0&0&1&0&0\\ 0&0&0&1&0\\ 0&0&0&0&1\end{pmatrix}$ (9.128)

as well as matrices obtained from any permutations of rows and columns. The above considerations apply to exact models without noise. It can be shown however that the results hold true if noise is added in the sense that for unconditionally identifiable matrices sufficiently large (compared to the noise) interactions $ c_{ij}$ can be identified as the large residuals from an $ L_1$-fit. Three further comments are in order. Firstly Tukey's median polish can often identify interactions in the two-way-table. This is because it attempts to approximate the $ L_1$-solution. At each step the $ L_1$-norm is reduced or at least not increased but unfortunately the median polish may not converge and, even if it does, it may not reach the $ L_1$ solution. Secondly $ L_1$ solutions in the presence of noise are not unique. This can be overcome by approximating the moduls function $ \vert x \vert$ by a strictly convex function almost linear in the tails. Thirdly, if there is more than one observation per cell it is recommended that they are replaced by the median and the method applied to the medians. Finally we point out that an interaction can also be an outlier. There is no a priori way of distinguishing the two.


next up previous contents index
Next: References Up: 9. Robust Statistics Previous: 9.4 Linear Regression