The field of scientific visualization has greatly enhanced the set of tools available for the statistician interested in exploring the features of a density estimate in more than two dimensions. In this section, we demonstrate by example the exploration of trivariate data.
We continue our analysis of the data given by the duration of consecutive eruptions of the Old Faithful geyser. A graph of the histogram of these data is displayed in Fig. 4.2b. We further modified the data as follows: the values that were only recorded to the nearest minute were blurred by adding uniform noise of seconds in duration. (The remaining data points were recorded to the nearest second). An easy way to generate high-dimensional data from a univariate time series is to group adjacent values. In Fig. 4.12, ASH's of the univariate data and the lagged data are shown. The obvious question is whether knowledge of is useful for predicting the value of . Clearly, the answer is in the affirmative, but the structure would not be well-represented by an autoregressive model.
Next, we computed the ASH for the trivariate lagged data . The resulting estimate, , may be explored in several fashions. The question is whether knowing can be used to predict the joint behavior of . This may be accomplished, for example, by examining slices of the trivariate density. Since the (univariate) density has two modes at and minutes, we examine the slices and ; see Fig. 4.13. The data points were divided into two groups, depending on whether or not. The first group of points was added to Fig. 4.13a, while the second group was added to Fig. 4.13b.
|
Since each axis was divided into 100 bins, there are 98 other views one might examine like Fig. 4.13. (An animation is actually quite informative.) However, one may obtain a holistic view by examining level sets of the full trivariate density. A level set is the set of all points such that , where is the maximum or modal value of the density estimate, and is a constant that determines the contour level. Such contours are typically smooth surfaces in . When , then the ''contour'' is simply the modal location point. In Fig. 4.14, the contour corresponding to is displayed. Clearly these data are multimodal, as five well-separated high-density regions are apparent. Each cluster corresponds to a different sequence of eruption durations, such as long-long-long. The five clusters are now also quite apparent in both frames of Fig. 4.13. Of the eight possible sequences, three are not observed in this sequence of eruptions.
A single contour does not convey as much information as several. Depending on the display device, one may reasonably view three to five contours, using transparency to see the higher density contours that are ''inside'' the lower density contours. Consider adding a second contour corresponding to to that in Fig. 4.14. Rather than attempt to use transparency, we choose an alternative representation which emphasizes the underlying algorithms. The software which produced these figures is called ashn and is available at the author's website. ASH values are computed on a three-dimensional lattice. The surfaces are constructed using the marching cubes algorithm ([20]), which generates thousands of triangles that make up each surface. In Fig. 4.15, we choose not to plot all of the triangles but only every other ''row'' along the second axis. The striped effect allows one to interpolate and complete the low-density contour, while allowing one to look inside and see the high-density contour. Since there are five clusters, this is repeated five times. A smaller sixth cluster is suggested as well.
|