Nonhierarchical clustering possesses as a monotonically increasing ranking of strengths as clusters themselves progressively become members of larger clusters. These clustering methods do not possess tree-like structures and new clusters are formed in successive clustering either by merging or splitting clusters.
One of the nonhierarchical clustering methods is the partitioning method. Consider a
given number of clusters, for example , as the objective and the partition of the
object to obtain the required
clusters. In contrast to the hierarchical method,
this partitioning technique permits objects to change group membership through the
cluster formation process. The partitioning method usually begins with an initial
solution, after which reallocation occurs according to some optimality criterion.
Partitioning method constructs clusters from the data as
follows:
|
This method is developed by Queen (1967). He suggests the name K-means for describing his algorithm that assigns each item to the cluster having the nearest centroid (mean). This process consists of three steps
This method tries to minimize the sum of the within-cluster variances.
![]() |
(9.13) |
![]() |
(9.14) |
The above criterion of the K-means method can be derived straightforwardly by using the Maximum Likelihood approach assuming that the populations are independently and normally distributed.
Below is the usage of the quantlet
kmeans
in
XploRe
. It computes a partition
of n row points into K clusters.
ckm = kmeans (x, b, it{, w, m})where x is
The output of this quantlet
kmeans
consists of cm.g
which is a matrix containing
the final partition, cm.c is a matrix of means (centroids) of the
clusters, cm.v is a matrix of within cluster variances divided by the weight (mass)
of clusters, and cm.s is a matrix of the weight (mass) of clusters.
In the following example, we generate random data with 3 clusters
randomize(0) x = normal(100, 4) ; generate random normal data x1 = x - #(2,1,3,0)' x2 = x + #(1,1,3,1)' x3 = x + #(0,0,1,5)' x = x1|x2|x3 b = ceil(uniform(rows(x)).*3) ; generate a random partitionfurthermore, we apply K-means clustering to the data and show the initial partition and the final partition
{g, c, v, s} = kmeans(x, b, 100) b~g
Contents of object _tmp [ 1,] 1 2 [ 2,] 3 2 [ 3,] 1 2 ... [297,] 1 1 [298,] 2 1 [299,] 1 1 [300,] 2 1
|
In order to increase the stability in cluster analysis, specific weights or adaptive
weights in the distance formula could be applied rather than ordinary weight
or
.
For example, the simple adaptive weights
![]() |
(9.15) |
![]() |
(9.16) |
The ``true'' pooled standard deviations cannot be computed in cluster analysis
in advance because the cluster analysis structure is usually unknown. Otherwise,
it is known that the pooled standard deviations concerning a random partition are
nearly equal to the total standard deviations. Therefore, starting with the weights
and a random initial partition
the K-means method
computes a (local) optimum partition
of
observations into
clusters.
Below is the quantlet
adaptive
to performs an adaptive K-means cluster
analysis with appropriate (adaptive) multivariate graphic using the principal components
ca = adaptive(x, k, w, m, t)Following is the example of adaptive clustering in XploRe
randomize(0) x = normal(200, 5) ; generate random data with 3 clusters x1 = x - #(2,1,3,0,0)' x2 = x + #(1,1,3,1,0.5)' x3 = x + #(0,0,1,5,1)' x = x1|x2|x3 w = 1./var(x) ; compute column variances m = matrix(rows(x)) ; generate true partition t = matrix(200)|matrix(200).+1|matrix(200).+2 ca = adaptive (x, 3, w, m, t) ; apply adaptive clustering
|
Fuzzy sets were introduced by Zadeh (1965). It offers a new way to isolate and identify functional relationship - qualitative and quantitative, which also called the pattern recognition.
In general, fuzzy models can be constructed in two ways:
We concentrate only on the identification techniques. One of this techniques is fuzzy clustering method. With a sufficiently informative identification data set, this method does not require any prior knowledge on the partitioning of the domains. Moreover, the use of membership values provides more flexibility and makes the clustering results locally interpretable and often corresponds well with the local behaviour of the identified process.
The idea of fuzzy clustering came from Ruspini (1969)'s hard C-means. He introduces the fuzzification of hard C-means to accommodate the intergrades for situations where the groups are not well-separated with hybrid points between groups as:
![]() |
(9.17) |
The syntax of this algorithm in XploRe is the following:
hcm=xchcme(x,c,m,e)The inputs are the following;
For an example, we use
butterfly
data set taken from Bezdek (1981).
This data set consists of
matrix. It is called ``butterfly'' because it scatters like butterfly.
After loading the quantlib xclust , we load the data set
library("xclust") z=read("butterfly.dat") x=z[,2:3] c=2 m=1 e=0.001and apply hard C-means clustering
hcm=xchcme(x,c,m,e) hcm.clus d=createdisplay(1,1) setmaskp(x,hcm.clus,hcm.clus+2,8) show(d,1,1,x) title="Hard-c-means for Butterfly Data" setgopt(d,1,1,"title", title)
Contents of hcm.clus [ 1,] 2 [ 2,] 2 ... [ 7,] 2 [ 8,] 2 [ 9,] 1 ... [14,] 1 [15,] 1
From the Figure 9.12, we can see that the data separate into two
clusters. Although the observation number namely
exactly in the
middle, but this observation must be belong to the first cluster or
the second cluster. It can not be constructed another cluster. For this
example we see that this observation belong to the first cluster.
|
One approach to fuzzy clustering, probably the best and most commonly used, is the fuzzy
C-means Bezdek (1981). Before Bezdek, Dunn (1973) had developed the
fuzzy C-means Algorithm. The idea of Dunn's algorithm is to extend the classical
within groups sum of squared error objective function to a fuzzy version by minimizing
this objective function. Bezdek generalized this fuzzy objective function by introducing
the weighting exponent ,
;
![]() |
(9.18) |
![]() |
(9.19) |
The fuzzy C-means (FCM) uses an iterative optimization of the objective function, based on the
weighted similarity measure between and the cluster center
.
Steps of the fuzzy C-means algorithm, according to Hellendorn and Driankov (1997) are the following:
![]() |
(9.20) |
![]() |
(9.21) |
In pseudocode, we can say
Initialize membership (U) iter = 0 Repeat {Picard iteration} iter = iter+1 Calculate cluster center (C) Calculate distance of data to centroid ||X-C|| U'=U Update membership U Until ||U-U'|| <= tol_crit .or. iter = Max_iterThe syntax of this algorithm in XploRe is
fcm=xcfcme(x,c,m,e,alpha)The inputs are the following;
Below is an example. We use the same data as quantlet
xcfcme
.
And also we do exactly the same procedure, except the part of applying fuzzy C-means
clustering.
library("xclust") z=read("butterfly.dat") x=z[,2:3] c=2 m=1.25 e=0.001 alpha=0.9 fcm=xcfcme(x,c,m,e,alpha) ; apply fuzzy c-means clustering fcm.clus d=createdisplay(1,1) setmaskp(x,fcm.clus,fcm.clus+3,8) show(d,1,1,x) title="Fuzzy-c-means for Butterfly Data" setgopt(d,1,1,"title", title)
The result is here
Contents of fcm.clus [ 1,] 1 [ 2,] 1 ... [ 8,] 3 [ 9,] 2 ... [14,] 2 [15,] 2This result can be shown in Figure 9.13.
By using
and
, we can see that, not all observations belong to
the first cluster or the second cluster. But the 8-th observation
form a new cluster. Because this observation has the same distance
to the center of both previous cluster.
For another example, we use bank2.dat that has also explained by Ward method.
After loading the quantlib
xclust
, we load the
bank2
dataset. We do exactly the same with the previous example with the
quantlet
XAGclust16.xpl
The result that we have as follows
[ 1,] 1 [ 2,] 1 ... ... [ 98,] 1 [ 99,] 1 [100,] 1 [101,] 2 [102,] 2 [103,] 1 [104,] 2 ... [199,] 2 [200,] 2If we compare to the Ward method depicted by Figure 9.14, we have not exactly the same cluster. By fuzzy C-mean, the
|
Now, we want to compare both of these methods for three clusters depicted by
Figure 9.15 with the quantlet
XAGclust17.xpl
.
|