Group: | Cluster analysis |
See also: | tree kmeans |
Function: | agglom | |
Description: | performs hierarchical cluster analysis. |
Usage: | cagg = agglom (d, method, no{, opt}) | |
Input: | ||
d | n x 1 vector or l x l matrix of distances | |
method | string, one of: "WARD", "SINGLE", "COMPLETE", "MEAN_LINK", "MEDIAN_LINK", "AVERAGE", "CENTROID" or "LANCE". | |
no | scalar, number of clusters | |
opt | optional argument for some methods - see note below | |
Output: | ||
cagg.p | l x 1 matrix with partition numbers (1,2,...) | |
cagg.t | p x 2 matrix with the dendrogram for no clusters | |
cagg.g | p x 2 matrix with the dendrogram for all l clusters | |
cagg.pd | l x 1 matrix with with partition numbers (1,2,...) | |
cagg.d | no x (no-1)/2 matrix with distances between the cluster centers |
The options (linkage strategies) for method are: WARD, SINGLE, COMPLETE, MEAN_LINK, MEDIAN_LINK, AVERAGE, CENTROID, LANCE.
The optional parameter opt is either a scalar parameter for the LANCE method (beta) or a (l x 1) vector of weights for the methods AVERAGE, CENTROID and WARD. For all other methods an error results if an optional parameter is specified.
proc()=main() ; load the swiss banknote data x=read("bank2") ; compute the euclidean distance between banknotes i=0 d=0.*matrix(rows(x),rows(x)) while(i.<cols(x)) i = i+1 d = d+(x[,i] - x[,i]')^2 endo d = sqrt(d) ; use the WARD method to cluster the data t = agglom(d, "WARD", 3) t.p endp ; main()
//gives the partition of the data into 3 clusters Contents of p [ 1,] 1 [ 2,] 1 [ 3,] 1 [ 4,] 1 [ 5,] 1 [ 6,] 1 [ 7,] 1 ... [ 98,] 1 [ 99,] 1 [100,] 1 [101,] 2 [102,] 2 [103,] 2 [104,] 2 [105,] 3 [106,] 2 [107,] 2 ... [194,] 2 [195,] 3 [196,] 2 [197,] 2 [198,] 2 [199,] 2 [200,] 2