Next: 11.7 Coordinates Up: 11. The Grammar of Previous: 11.5 Statistics

11.6 Geometry

GOG presumes no connection between a statistical method and a geometric representation. Histogram bins need not be represented by histograms. Tukey schematic plots (his original word for box plots) need not be represented by boxes and whiskers. Regressions need not be represented by lines or curves. Separating geometry from data (and from other graphical aspects such as coordinate systems) is what gives GOG its expressive power. We choose geometric representation objects independently of statistical methods, coordinate systems, or aesthetic attributes.

As Fig. 11.1 indicates, the geometry component of GOG receives a varset and outputs a geometric graph. A geometric graph is a subset of $\mathbb{R}^{n}$ . For our purposes, we will be concerned with geometric graphs for which $1\leq n\leq 3$ . Geometric graphs are enclosed in bounded regions:

$\displaystyle B^{{n}}\subset \left[a_{1}, b_{1}\right] \times\ldots\times \left[a_{n}, b_{n}\right]$

These intervals define the edges of a bounding box or region in

-dimensional space. There are two reasons we need bounded regions. First, in order to define certain useful geometric graphs, we need concepts like the end of a line or the edge of a rectangle. Second, we want to save ink and electricity. We don't want to take forever to compute and draw a line.

Geometric graphs are produced by graphing functions $F: B^{n}\rightarrow\mathbb{R}^{n}$ that have geometric names like line or tile. A geometric graph is the image of . And a graphic, as used in the title of this chapter, is the image of a graph under one or more aesthetic functions. Geometric graphs are not visible. As [6] points out, visible elements have features not present in their geometric counterparts.

Figures 11.9 and 11.10 illustrate the exchangeability of geometry and statistical methods. The graphics are based on UN data involving 1990 estimates of female life expectancy and birth rates for selected world countries. Figure 11.9 shows four different geometric graphs - point, line, area, and bar - used to represent a confidence interval on a linear regression. Figure 11.10 shows one geometric graph used to represent four different statistical methods - local mean, local range, quadratic regression, and linear regression confidence interval.

**Figure 11.9:** Different graph types, same statistical method
$\includegraphics[width=\textwidth]{text/2-11/f01a.eps}$

**Figure 11.10:** Different statistical methods, same graph type
$\includegraphics[width=\textwidth]{text/2-11/f02a.eps}$

This exchangeability produces a rich set of graphic forms with a relatively small number of geometric graphs. Table 11.3 contains these graphing methods. The point graphing function produces a geometric point, which is an -tuple. This function can also produce a finite set of points, called a multipoint or a point cloud. The set of points produced by point is called a point graph.

The line graphing function function is a bit more complicated. Let $B^{m}$ be a bounded region in $\mathbb{R}^{m}$ . Consider the function $F: B^{m}\rightarrow\mathbb{R}^{n}$ , where , with the following additional properties:

The image of is bounded, and
$F(x) = (\boldsymbol{v}, f(\boldsymbol{v}))$ , where $f: B^{m}\rightarrow\mathbb{R}$ and $\boldsymbol{v} = (x_{1}, \ldots , x_{m}) \in B^{m}$ .

, this function maps an interval to a functional curve on a bounded plane. And if

, it maps a bounded region to a functional surface in a bounded

D space. The line

graphing function produces these graphs. Like point

, line

can produce a finite set of lines. A set of lines is called a multiline. We need this capability for representing multimodal smoothers, confidence intervals on regression lines, and other multifunctional lines.

**Table 11.3:** Geometric Graphs
Relations	Summaries	Partitions	Networks
point	schema	tile	path
line (surface)		contour	link
area (volume)
bar (interval)
histobar

The area graphing function produces a graph containing all points within the region under the line graph. The bar graphing function produces a set of closed intervals. An interval has two ends. Ordinarily, however, bars are used to denote a single value through the location of one end. The other end is anchored at a common reference point (usually zero). The histobar graphing function produces a histogram element. This element behaves like a bar except a value maps to the area of a histobar rather than to its extent. Also, histobars are glued to each other. They cover an interval or region, unlike bars.

A schema is a diagram that includes both general and particular features in order to represent a distribution. We have taken this usage from [39], who invented the schematic plot, which has come to be known as the box plot because of its physical appearance. The schema graphing function produces a collection of one or more points and intervals.

The tile graphing function tiles a surface or space. A tile graph covers and partitions the bounded region defined by a frame; there can be no gaps or overlaps between tiles. The Latinate tessellation (for tiling) is often used to describe the appearance of the tile graphic.

A contour graphing function produces contours, or level curves. A contour graph is used frequently in weather and topographic maps. Contours can be used to delineate any continuous surface.

The network graphing function joins points with line segments (edges). Networks are representations that resemble the edges in diagrams of theoretic graphs. Although networks join points, a point graph is not needed in a frame in order for a network graphic to be visible.

Finally, the path graphing function produces a path that connects points such that each point touches no more than two line segments. Thus, a path visits every point in a collection of points only once. If a path is closed (every point touches two line segments), we call it a circuit. Paths often look like lines. There are several important differences between the two, however. First, lines are functional; there can be only one point on a line for any value in the domain. Paths may loop, zigzag, and even cross themselves inside a frame. Second, paths consist of segments that correspond to edges, or links between nodes. This means that a variable may be used to determine an attribute of every segment of a path.

Figure 11.11 contains two geometric objects for representing the regression we computed in Fig. 11.8. We use a point for representing the data and a line for representing the regression line.

**Figure 11.11:** `pop1980` $\ast$ {`pop2000`, *estimate*(`pop2000`)}, *xlog*, *ylog*, {*point*, *line*}
$\includegraphics[width=80mm]{text/2-11/figure8.eps}$

Next: 11.7 Coordinates Up: 11. The Grammar of Previous: 11.5 Statistics