Next: 11.5 Statistics Up: 11. The Grammar of Previous: 11.3 Algebra

Subsections

# 11.4 Scales

Before we compute summaries (totals, means, smoothers, ) and represent these summaries using geometric objects (points, lines, ), we must scale our varsets. In producing most common charts, we do not notice this step. When we implement log scales, however, we notice it immediately. We must log our data before averaging logs. Even if we do not compute nonlinear transformations, however, we need to specify a measurement model.

The measurement model determines how distance in a frame region relates to the ranges of the variables defining that region. Measurement models are reflected in the axes, scales, legends, and other annotations that demarcate a chart's frame. Measurement models determine how values are represented (e.g., as categories or magnitudes) and what the units of measurement are.

## 11.4.1 Axiomatic Measurement

In constructing scales for statistical charts, it helps to know something about the function used to assign values to objects. [36] developed a taxonomy of such functions based on axioms of measurement. Stevens identified four basic scale types: nominal, ordinal, interval, and ratio.

To define a nominal scale, we assume there exists at least one equivalence class together with a binary equivalence relation () that can be applied to objects in the domain (e.g., the class of this object is the same as the class of that object). For a domain of objects and a set of values , we say that a scale is nominal if

To define an ordinal scale, we assume there exists a binary total order relation () that can be applied to objects in the domain (e.g., this stone is heavier than that stone). We then say that a scale is ordinal if

To define an interval scale, we assume there exists a symmetric concatenation operation () that can be applied to objects in the domain (e.g., the length of this stick appended to the length of that stick). We then say that a scale is interval if

To define a ratio scale, we assume there exists a magnitude comparison operation () that can be applied to objects in the domain (e.g., the ratio of the brightness of this patch to the the brightness of that patch). We then say that a scale is ratio if

Axiomatic scale theory is often invoked by practitioners of data mining and graphics, but it is not sufficient for determining scales on statistical graphics produced by chart algebra. The blend operation, for example, allows us to union values on different variables. We can require that blended variables share the same measurement level (e.g., diastolic and systolic blood pressure), but this will not always produce a meaningful scale. For example, we will have a meaningless composite scale if we attempt to blend height and weight, both presumably ratio variables. We need a different level of detail so that we can restrict the blend operation more appropriately.

## 11.4.2 Unit Measurement

An alternative scale classification is based on units of measurement. Unit scales permit standardization and conversion of metrics. In particular, the International System of Units (SI) ([38]) unifies measurement under transformation rules encapsulated in a set of base classes. These classes are length, mass, time, electric current, temperature, amount of substance, and luminous intensity. Within the base classes, there are default metrics (meter, kilogram, second, etc.) and methods for converting from one metric to another. From these base classes, a set of derived classes yields measurements such as area, volume, pressure, energy, capacitance, density, power, and force. Table 11.2 shows some examples of several SI base classes, derived classes, and an example of an economic base class that is not in SI. The currency class is time dependent, since daily exchange rates determine conversion rules and an inflation adjustment method varies with time.

 Length Mass Temperature Time Volume Currency meter kilogram kelvin second liter dollar point gram rankine minute teaspoon euro pica grain celsius hour tablespoon pound inch slug fahrenheit day cup yen foot carat week pint rupee yard month quart dinar mile quarter gallon furlong year bushel fathom century barrel

Most of the measurements in the SI system fit within the interval and ratio levels of Stevens' system. There are other scales fitting Stevens' system that are not classified within the SI system. These involve units such as category (state, province, country, color, species), order (rank, index), and measure (probability, proportion, percent). And there are additional scales that are in neither the Stevens nor the SI system, such as partial order.

For our purposes, unit measurement gives us the level of detail needed to construct a numerical or categorical scale. We consider unit measurement a form of strong typing that enables reasonable default behavior. Because of the class structure and conversion methods, we can handle labels and relations for derived quantities such as miles-per-gallon, gallons-per-mile, and liters-per-kilometer. Furthermore, automatic unit conversion within base and derived classes allows meaningful blends. As with domain check overrides in a database ([7]), we allow explicit type overrides for the blend operation.

## 11.4.3 Transformations

We frequently compute transformations of variables in constructing graphics. Sometimes, we employ statistical transformations to achieve normality so that we can apply classical statistical methods such as linear regression. Other times, we transform to reveal local detail in a graphic. It helps to apply a log transform, for example, to stretch out small data values in a display. We might do this even when not applying statistical models.

These types of transformations fall within the scale stage of the grammar of graphics system. Because GOG encapsulates variable transformations within this stage, it accomplishes two tasks at the same time: 1) the values of the variables are transformed prior to analysis and display, and 2) nice scale values for axes and legends are computed based on the transformation. Figure 11.7 shows an example of this process for the city data. In order to highlight population changes in small cities, we represent the populations on a log scale. The algebraic expression is the same as in Fig. 11.5: city (pop1980+pop2000). Now we see that most of the cities gained population between 1980 and 2000 but half the US namesakes lost population.

Next: 11.5 Statistics Up: 11. The Grammar of Previous: 11.3 Algebra