Group: | Cluster analysis |
See also: | tree agglom |
Function: | kmeans | |
Description: | performs cluster analysis, i.e. computes a partition of n row points into K clusters. |
Usage: | ckm = kmeans (x, b, it, {w, {m}}) | |
Input: | ||
x | n x p matrix data matrix | |
b | n x 1 matrix: Initial partition (for example random generated numbers of clusters 1,2,...,K | |
it | maximal number of iterations | |
w | p x 1 matrix with the weights of column points | |
m | n x 1 matrix of weights (masses) of row points | |
Output: | ||
cm.g | n x 1 matrix containing the final partition which gives a minimum sum of within cluster variances | |
cm.c | k x p matrix of means (centroids) of the K clusters | |
cm.v | k x p matrix of within cluster variances divided by the weight (mass) of clusters | |
cm.s | k x 1 matrix of the weight (mass) of clusters |
and for every cluster k (k=1,2,...,K) there must exist at least
one row point i (i=1,2,...,n) with b(i)=k.
; set the seed of the random generator randomize(0) ; generate some data x = normal(100, 4) ; generate first cluster x1 = x - #(2,1,3,0)' ; generate second cluster x2 = x + #(1,1,3,1)' ; generate third cluster x3 = x + #(0,0,1,5)' ; make a data set with 3 clusters x = x1|x2|x3 ; generate a random partition with 3 clusters b = ceil(uniform(rows(x)).*3) ; apply k-means clustering to the data {g, c, v, s} = kmeans(x, b, 100) ; show the startpartition and the final partition b~g
shows as result the start and the final partition of the data in 3 clusters Contents of _tmp [ 1,] 1 2 [ 2,] 3 2 [ 3,] 1 2 [ 4,] 3 2 [ 5,] 3 2 [ 6,] 2 2 [ 7,] 3 2 [ 8,] 2 2 [ 9,] 2 2 [ 10,] 3 2 [ 11,] 1 2 [ 12,] 2 2 [ 13,] 2 2 [ 14,] 3 2 [ 15,] 2 2 ... [286,] 2 1 [287,] 1 1 [288,] 1 1 [289,] 3 1 [290,] 2 1 [291,] 2 1 [292,] 3 1 [293,] 2 1 [294,] 3 1 [295,] 3 1 [296,] 3 1 [297,] 1 1 [298,] 2 1 [299,] 1 1 [300,] 2 1