11.4 Scales

Before we compute summaries (totals, means, smoothers, $\ldots$ ) and represent these summaries using geometric objects (points, lines, $\ldots$ ), we must scale our varsets. In producing most common charts, we do not notice this step. When we implement log scales, however, we notice it immediately. We must log our data before averaging logs. Even if we do not compute nonlinear transformations, however, we need to specify a measurement model.

The measurement model determines how distance in a frame region relates to the ranges of the variables defining that region. Measurement models are reflected in the axes, scales, legends, and other annotations that demarcate a chart's frame. Measurement models determine how values are represented (e.g., as categories or magnitudes) and what the units of measurement are.

11.4.1 Axiomatic Measurement

In constructing scales for statistical charts, it helps to know something about the function used to assign values to objects. [36] developed a taxonomy of such functions based on axioms of measurement. Stevens identified four basic scale types: nominal, ordinal, interval, and ratio.

To define a nominal scale, we assume there exists at least one equivalence class together with a binary equivalence relation ( $\sim$ ) that can be applied to objects in the domain (e.g., the class of this object is the same as the class of that object). For a domain of objects

and a set of values $X(d), d\in D$ , we say that a scale is nominal if

$\displaystyle \nonumber d_{i}\sim \ d_{j}\iff X(d_{i}) = X(d_{j}), \ \forall \ d_{i}, d_{j}\ \in D\,.$

$\displaystyle \nonumber d_{i}\succ d_{j}\iff X(d_{i}) > X(d_{j}), \ \forall \ d_{i}, d_{j}\ \in D\,.$

$\displaystyle \nonumber d_{i}\oplus \ d_{j} \sim d_{k}\iff X(d_{i}) + X(d_{j}) = X(d_{k}), \ \forall \ d_{i}, d_{j}, d_{k}\ \in D\,.$

$\displaystyle \nonumber d_{i}\oslash \ d_{j} \sim d_{k}\iff X(d_{i}) / X(d_{j}) = X(d_{k}), \ \forall \ d_{i}, d_{j}, d_{k}\ \in D\,.$

Axiomatic scale theory is often invoked by practitioners of data mining and graphics, but it is not sufficient for determining scales on statistical graphics produced by chart algebra. The blend operation, for example, allows us to union values on different variables. We can require that blended variables share the same measurement level (e.g., diastolic and systolic blood pressure), but this will not always produce a meaningful scale. For example, we will have a meaningless composite scale if we attempt to blend height and weight, both presumably ratio variables. We need a different level of detail so that we can restrict the blend operation more appropriately.

11.4.2 Unit Measurement

An alternative scale classification is based on units of measurement. Unit scales permit standardization and conversion of metrics. In particular, the International System of Units (SI) ([38]) unifies measurement under transformation rules encapsulated in a set of base classes. These classes are length, mass, time, electric current, temperature, amount of substance, and luminous intensity. Within the base classes, there are default metrics (meter, kilogram, second, etc.) and methods for converting from one metric to another. From these base classes, a set of derived classes yields measurements such as area, volume, pressure, energy, capacitance, density, power, and force. Table 11.2 shows some examples of several SI base classes, derived classes, and an example of an economic base class that is not in SI. The currency class is time dependent, since daily exchange rates determine conversion rules and an inflation adjustment method varies with time.

**Table 11.2:** Typical unit measurements
Length	Mass	Temperature	Time	Volume	Currency
meter	kilogram	kelvin	second	liter	dollar
point	gram	rankine	minute	teaspoon	euro
pica	grain	celsius	hour	tablespoon	pound
inch	slug	fahrenheit	day	cup	yen
foot	carat		week	pint	rupee
yard			month	quart	dinar
mile			quarter	gallon
furlong			year	bushel
fathom			century	barrel

Most of the measurements in the SI system fit within the interval and ratio levels of Stevens' system. There are other scales fitting Stevens' system that are not classified within the SI system. These involve units such as category (state, province, country, color, species), order (rank, index), and measure (probability, proportion, percent). And there are additional scales that are in neither the Stevens nor the SI system, such as partial order.

For our purposes, unit measurement gives us the level of detail needed to construct a numerical or categorical scale. We consider unit measurement a form of strong typing that enables reasonable default behavior. Because of the class structure and conversion methods, we can handle labels and relations for derived quantities such as miles-per-gallon, gallons-per-mile, and liters-per-kilometer. Furthermore, automatic unit conversion within base and derived classes allows meaningful blends. As with domain check overrides in a database ([7]), we allow explicit type overrides for the blend operation.

11.4.3 Transformations

We frequently compute transformations of variables in constructing graphics. Sometimes, we employ statistical transformations to achieve normality so that we can apply classical statistical methods such as linear regression. Other times, we transform to reveal local detail in a graphic. It helps to apply a log transform, for example, to stretch out small data values in a display. We might do this even when not applying statistical models.

These types of transformations fall within the scale stage of the grammar of graphics system. Because GOG encapsulates variable transformations within this stage, it accomplishes two tasks at the same time: 1) the values of the variables are transformed prior to analysis and display, and 2) nice scale values for axes and legends are computed based on the transformation. Figure 11.7 shows an example of this process for the city data. In order to highlight population changes in small cities, we represent the populations on a log scale. The algebraic expression is the same as in Fig. 11.5: city $\ast$ (pop1980+pop2000). Now we see that most of the cities gained population between 1980 and 2000 but half the US namesakes lost population.

**Figure 11.7:** `city` $\ast$ (`pop1980` + `pop2000`), *ylog*
$\includegraphics[width=103mm]{text/2-11/figure6.eps}$