At any stage of the procedure, a hierarchical clustering technique performs either a merger of clusters or a division of a cluster at previous stage. It will conceptually give rise to a tree like structure of the clustering process. It is understood that the clusters of items formed at any stage are nonoverlapping or mutually exclusive.
Hierarchical clustering techniques proceed by either a series of successive mergers or a series of successive divisions.
The results of these methods can be displayed in a dendrogram. The dendrogram is the tree-like diagram that can depict the mergers or divisions which have been made at successive level. Below, in Figure 9.1, is the example of the dendrogram by using eight pairs of data above.
|
This method starts with each object describing a cluster, and then combines them into more inclusive clusters until only one cluster remains. Härdle and Simar (1998) considered the algorithm of agglomerative hierarchical method as follows,
If two objects or groups and
are to be united one obtains the distance
to another group (object)
by the following distance function
|
cagg = agglom (d, method, no{, opt})where d is a
The output of this quantlet
agglom
are: cagg.p is a vector with
partition numbers
, cagg.t is a matrix with the dendrogram
for the number of clusters (no), cagg.g is a matrix with the dendrogram
for all
clusters, cagg.pd is a matrix with partition numbers
,
and cagg.d is a vectormatrix with distances between the cluster centers.
![]() |
(9.9) |
For example, we describe the single linkage method for the eight data points displayed in Figure 9.1
First we prepare the data,
x=#(5,2,-2,-3,-2,-2,1,1)~ #(-3,-4,-1,0,-2,4,2,4) ; creates 8 pairs of data n=rows(x) ; rows of data xs=string("%1.0f", 1:n) ; adds labels setsize(500, 500) dd1=createdisplay(1,1) setmaskp(x, 0, 0, 0) setmaskt(x, string("%.0f", 1:rows(x)), 0, 0, 16) setmaskl(x, 1~2~7~8~6~0~7~3~5~0~3~4, 0, 1, 1) show(dd1, 1, 1, x) ; shows data setgopt(dd1, 1, 1,"xlab","first coord.", "ylab","second coord.") setgopt(dd1, 1, 1,"title","8 points","xoff",7|7,"yoff",7|7)then we calculate the Euclidean distance and apply the single lingkage method,
d=distance(x, "euclid") ; Euclidean distance d.*d ; squared distance matrix t=agglom(d.*d, "SINGLE", 5) ; here single linkage method g=tree(t.g, 0, "CENTER") g=g.points l = 5.*(1:rows(g)/5) + (0:4)' - 4 setmaskl (g, l, 0, 1, 1) setmaskp (g, 0, 0, 0)finally we show the plot of the raw data and the dendrogram
tg=paf(t.g[,2], t.g[,2]!=0) numbers=(0:(rows(x)-1)) numbers=numbers~((-1)*matrix(rows(x))) setmaskp(numbers, 0, 0, 0) setmaskt(numbers, string("%.0f", tg), 0, 0, 14) dd2=createdisplay(1,1) show (dd2, 1, 1, g, numbers) setgopt(dd2, 1, 1, "xlab","Single Linkage Dendrogramm", "ylab","Squared Euclidian Distance") setgopt(dd2,1,1,"title","8 points","xoff",7|7,"yoff",7|7)
The plot of the dendrogram with single linkage method is shown in Figure 9.1.
If we decide to cut the tree at the level 10 then we find three clusters:
and
and
.
![]() |
(9.10) |
... t=agglom(d.*d, "SINGLE", 5) ; here single linkage method ...then we get
... t=agglom(d.*d, "COMPLETE", 5) ; here complete linkage method ...
Both of these two methods are
The average linkage method is the hierarchical method that avoids the extremes of either large clusters or tight compact clusters. This method appears as a compromise between the nearest and the farthest neighbor methods.
The simple average linkage (mean linkage) method takes both elements of the new cluster into account:
![]() |
(9.11) |
After the new distances are computed the matrix is reduced by one element of the new cluster. The algorithms loops back to find the next minimum value and continues until all objects are united into one cluster. However, this method is not invariant under monotone transformation of the distance.
If we change SINGLE into AVERAGE in the example above then we get as follows,
... t=agglom(d.*d, "AVERAGE", 5) ; here average linkage method ...
The dendrogram is shown in Figure 9.4. If we decide to cut the tree
at the level 10 then we find three clusters:
and
.
Everitt (1993) explained that with the centroid method, groups once formed are represented by their mean values for each variables (mean vector), and inter-group distance is defined in terms of distance between two such mean vectors. The use of the mean strictly implies that the variables are on an interval scale.
Figure 9.5 is plot of a dendrogram using centroid linkage method based on the eight
pairs of data with the quantlet
XAGclust07.xpl
.
If the sizes of two groups to be merged are very different, then the centroid of the new group will be very close to that of the larger group and may remain within that group. This is the disadvantage of the centroid method. For that reason, Gower (1967) suggested an alternative stategy, called median method because this method could be made suitable for both similarity and distance measures.
Plot of a dendrogram using median method based on the eight pairs of data is as in
Figure 9.6 with the quantlet
XAGclust08.xpl
.
|
Ward (1963) proposed a clustering procedure seeking to form the partitions
in a manner that minimizes the loss associated with
each grouping and to quantifies that loss in readily interpretable form. Information
loss is defined by Ward in terms of an error sum-of-squares (ESS) criterion. ESS
is defined as the following
![]() |
(9.12) |
The corresponding quantlet in XploRe as below
t = agglom (d, "WARD", 2)
The main difference between this method and the linkage methods consists in the unification procedure. This method does not put together groups with smallest distance, but it joins groups that do not increase too much a given measure of heterogenity. The aim of the Ward method is to unify groups such that the variation inside these groups is not increased too drastically. This results groups in clusters that are as homogeneous as possible.
The following quantlet gives an example of how to show the dendrogram with the WARD method in XploRe .
In this example we use
bank2.dat
dataset taken from Flury and Riedwyl (1988).
This dataset consists of
200 measurements on Swiss bank notes. One half of these bank notes
are genuine, the other half are forged bank notes. The variables
that use in this data set as follows:
= length of the bill,
= height of the bill (left),
= height of the bill (right),
= distance of the inner frame to the lower border,
= distance of the inner frame to the upper border,
= length of the diagonal of the central picture.
After starting, we compute the euclidean distance between banknotes:
proc()=main() x=read("bank2") i=0 ; compute the euclidean distance d=0.*matrix(rows(x),rows(x)) while (i.<cols(x)) i = i+1 d = d+(x[,i] - x[,i]')^2 endo d = sqrt(d)Next, we use the WARD method and show the dendrogram
t = agglom (d, "WARD", 2) ; use WARD method g = tree (t.g, 0) ; to cluster the data g=g.points l = 5.*(1:rows(g)/5) + (0:4)' - 4 setmaskl (g, l, 0, 1, 1) setmaskp (g, 0, 0, 0) d = createdisplay (1,1) show (d, 1, 1, g) ; show the dendrogram endp ; main()
[ 1,] 1 [ 2,] 1 ... ... [ 68,] 1 [ 69,] 1 [ 70,] 2 [ 71,] 1 [ 72,] 1 .... .... [ 99,] 1 [100,] 1 [101,] 2 [102,] 2 ... ... [199,] 2 [200,] 2
The other quantlet that we use is
wardcont
. The aim of this quantlet is
to perform Ward's hierarchical cluster analysis of the rows as well as of the columns
of a contingency table. It includes the multivariate graphic using the correspondence
analysis. It makes available the factorial coordinates of the row points and column
points (scores).
The syntax of this quantlet is as follows.
cw = wardcont (x, k, l)where x is an
For an example, we use bird.dat dataset taken from Mucha (1992). This dataset consists of 412 area (each area = 1 quadrat km) and 102 kinds of bird. The area is divided into 12 groups and the kinds of birds are divided into 9 groups.
After loading the quantlib
xclust
, we apply the
wardcont
method:
library ("xclust") x=read("bird.dat") cw = wardcont(x, 3, 3)
|
The divisive hierarchical methods proceed in the opposite way of the agglomerative
hierarchical method. In this method, an initial single group of objects is divided
into two groups such that the objects in one subgroup are far from the objects in
the other. We can divide this method into two types:
, which divides
the data on the basis of the possession of a single specified attribute, and
, where divisions are based on the values taken by several attributes.
The
divisive
quantlet in
XploRe
performs an adaptive divisive K-means
cluster analysis with an appropriate (adaptive) multivariate graphic using
principal components:
cd = divisive (x, k, w, m, sv)where x is an
The outputs of this quantlet
divisive
are: cd.p is the partition of
points of x into k clusters,
cd.n is the number of observations of
clusters, cd.a is the matrix of final (pooled) adaptive weights of the variables.
We illustrate the usage of quantlet
divisive
in the following example.
After loading the quantlib xclust , we generate random data with 4 clusters,
randomize(0) x = normal(30, 5) x1 = x - #(2,1,3,0,0)' x2 = x + #(1,1,3,1,0.5)' x3 = x + #(0,0,1,5,1)' x4 = x - #(0,2,1,3,0)' x = x1|x2|x3|x4Then, we compute column variances and row weights
w = 1./var(x) m = matrix(rows(x))Next, we apply divisive methods and compare the results between estimated and true partition,
cd = divisive (x, 4, w, m, 1111) conting (cd.p, ceil((1:120)/30))
Contents of h [1,] 0 30 0 0 [2,] 0 0 30 0 [3,] 30 0 0 0 [4,] 0 0 0 30The output is the crosstable of 120 observations that divide into four clusters. Each cluster consists of 30 observations and corresponds to the given class without any error.