Next: 11.10 Analytics Up: 11. The Grammar of Previous: 11.8 Aesthetics

11.9 Layout

Chart algebra does not determine the physical appearance of charts plotted on a screen or paper. It simply produces a set of tuples $(x_{1}, \ldots , x_{p})$ that can be rendered using geometric primitives and a layout interpreter. If we have -tuples, then we can render them directly on a computer screen or piece of paper. If we have -tuples, then we can use a perspective projection to render them on the plane. Higher-order tuples require a layout scheme to embed all dimensions in the plane. Various layout schemes are attempts to solve a graphic representation problem: how to transform a -dimensional vector space to a -dimensional space so that we can perceive structures in the higher dimensional space by examining the -dimensional space. We will discuss several approaches in this section.

11.9.1 Projection

One scheme is to employ a linear or nonlinear projection from p-dimensions to two. This may cause loss of information because a projection onto a subspace is many-to-one. Also, projection is most suitable for displaying points or $\{V,E\}$ graphs. It is less suitable for many geometric chart types such as bars and pies. Nevertheless, some 2D projections have been designed to capture structures contained in subspaces, such as manifolds, simplices, or clusters ([12]). Other popular projection methods are principal components and multidimensional scaling ([17]).

11.9.2 Sets of Functions

A second possibility is to map a set of points in $\mathbb{R}^{p}$ one-to-one to a set of functions in $\mathbb{R}^{2}$ . A particularly useful class of functions is formed by taking the first terms in a Fourier series as coefficients for $(x_{1}, \ldots , x_{p})$ ([3]). Another useful class is the set of Chebysheff orthogonal polynomials. A popular class is the set of piecewise linear functions with $(x_{1}, \ldots , x_{p})$ as knots, often called parallel coordinates ([19,40]).

An advantage of function space representations is that there is no loss of information, since the set of all possible functions for each of these types in $\mathbb{R}^{2}$ is infinite. Orthogonal functions (such as Fourier and Chebysheff) are useful because zero inner products are evidence of linear independence. Parallel coordinates are useful because it is relatively easy to decode values on particular variables. A disadvantage of functional representations is that manifolds, solids, and distances are difficult to discern.

11.9.3 Recursive Partitioning

A third possibility is recursive partitioning. We choose an interval $[u_{1}, u_{2}]$ and partition the first dimension of $\mathbb{R}^{p}$ into a set of connected intervals of size $(u_{2} - u_{1})$ , in the same manner as histogram binning. This yields a set of rectangular subspaces of $\mathbb{R}^{p}$ . We then partition the second dimension of $\mathbb{R}^{p}$ similarly. This second partition produces a set of rectangular subspaces within each of the previous subspaces. We continue choosing intervals and partitioning until we finish the last dimension. We then plot each subspace in an ordering that follows the ancestry of the partitioning. Recursive partitioning layout schemes have appeared in many guises: $\mathbb{R}^{p} \mapsto \mathbb{R}^{3}$ ([9]), $\mathbb{R}^{p} \mapsto \mathbb{R}^{2}$ ([26]), $\mathbb{R}^{4} \mapsto \mathbb{R}^{2}$ ([5]).

There are several modifications we may make to this scheme. First, if a dimension is based on a categorical variable, then we assume $(u_{2} - u_{1}) = 1$ , which assures one partition per category. Second, we need not partition a dimension into equal intervals; instead, we can make $[u_{1}, u_{2}]$ adaptive to the density of the data ([43, page 186]). Third, we can choose a variety of layouts for displaying the nodes of the partitioning tree. We can display the cells as an -ary tree, which is the method used by popular decision-tree programs. Or, we can alternate odd/even dimensions by plotting horizontally/vertically. This display produces a D nested table, which has been variously named a mosaic ([16]) or treemap ([20]). We use this latter scheme for the figures in this article.

This rectangular partitioning resembles a 2D rectangular fractal generator. Like simple projection, this method can cause loss of information because aggregation occurs within cells. Nevertheless, it yields an interpretable 2D plot that is familiar to readers of tables.

Because recursive partitioning works with either continuous or categorical variables, there is no display distinction between a table and a chart. This equivalence between tables and graphs has been noted in other contexts ([34,30]). With recursive partitioning, we can display tables of charts and charts of tables.

Next: 11.10 Analytics Up: 11. The Grammar of Previous: 11.8 Aesthetics